CQCL / lambeq

A high-level Python library for Quantum Natural Language Processing
https://docs.quantinuum.com/lambeq/
Apache License 2.0
455 stars 111 forks source link

DisCoCirc: Add a filtering mechanism for the nouns #187

Open dimkart opened 2 hours ago

dimkart commented 2 hours ago

The set of entities corresponding to wires in a DisCoCirc diagram should be controlled in some way to keep the size of the diagram manageable and to avoid the inclusion of insignificant entities that could appear in the document. Using a simple metric such as TF-IDF might be a good first step. The metric can apply on simple noun tokens, or on dependencies derived from a dependency parser for more robust coverage.

dimkart commented 2 hours ago

@AnnaNPearson suggested that the nouns that do not make the threshold shouldn't be discarded completely but it would be useful to appear in the diagram as single boxes, interacting with their context. This feature can be offered to the user as an extra option.