MikeHopcroft / ShortOrder

A natural language conversational agent for ordering and organizing items from a catalog.
MIT License
13 stars 7 forks source link

BadWord list generation #4

Closed MikeHopcroft closed 6 years ago

MikeHopcroft commented 6 years ago

Downstream tokenizers should be able to generate a list of "bad words" that upstream tokenizers will use to invalidate matches.

Suppose, for example, that we have an entity tokenizer that is aware of the entity, medium marble, followed by an attribute tokenizer that is aware of the attribute, medium. The upstream, entity tokenizer should never report medium marble as a match for medium because that match consists solely of bad words from the attribute tokenizer. Reporting medium marble as a match for marble would be fine because marble wouldn't be on the attribute tokenizer's list of bad words.

MikeHopcroft commented 6 years ago

Implemented in e9d991a94b1c950e9fff404f8b1bc920c5a5b0f8.