dishmint / LexicalCases

Extract substrings matching a lexical pattern
https://www.paclets.com/FaizonZaman/LexicalCases
MIT License
2 stars 0 forks source link

Preserve position of TextType #280

Open dishmint opened 11 months ago

dishmint commented 11 months ago

https://github.com/dishmint/LexicalCases/blob/8fb6d24125f06aae52ef68ae6143ffad47cf2906/FaizonZaman/LexicalCases/Kernel/LexicalCases.wl#L263

Currently, all instances of a text type will be merged into a single Alternatives. This is imprecise.

If I have LexicalPattern[TextType["Noun"]~~TextType["Verb"]~~TextType["Noun"]], the noun TTs in positions 1 and 3 will match the same set of nouns. A precise lexical pattern shouldn't work this way — each noun TT should only match nouns that actually occur in those positions. To make this work with the current implementation, it probably requires additional positional checks.

A successful TT match is one where the TT occurs in the same sequence position in the source text as it does in the lexical pattern.

The whole point of grabbing all TT instances first, is that the alternative is slow. Hopefully performing this check doesn't add too much overhead.

dishmint commented 5 months ago

Holding off on this one for now — I don't have an efficient design yet.