curiosity-ai / catalyst

🚀 Catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's design, it brings pre-trained models, out-of-the box support for training word and document embeddings, and flexible entity recognition models.
MIT License
715 stars 73 forks source link

Regex Support in Pattern. #58

Closed juliolitwin closed 2 years ago

juliolitwin commented 3 years ago

Hi,

Is there any way I can add regex to get entites (in PatternUnit or Spotter)? I've tested it with WithTokens, but apparently it only works with strings and I haven't found another way that can enable me to work with regex.

Maybe something like:

new PatternUnit(Catalyst.PatternUnitPrototype.Single().WithRegex(regex)

Or is there some way of abstraction that I can work with its own function that returns in bool?

Thanks, Cheers!

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

theolivenbaum commented 2 years ago

Hi @juliolitwin we don't support regex matching on the pattern matcher mainly because the matcher runs after the tokenization process, so the regex would only be able to match against the content of a single token. If that restriction is acceptable, it should be possible to extend the class to receive a regex pattern + regex options flag, and use that for matching. Happy to merge a PR if you want to contribute it.

Regarding the callback - as the models need to be serializable, we don't currently support that.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.