curiosity-ai / catalyst

🚀 Catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's design, it brings pre-trained models, out-of-the box support for training word and document embeddings, and flexible entity recognition models.
MIT License
715 stars 73 forks source link

Help needed in the pattern composition #51

Closed GalawynRM closed 2 years ago

GalawynRM commented 3 years ago
mp => mp.Add(
                    new PatternUnit(P.Single().WithTokens(quadriTokens)),
                    new PatternUnit(P.Single().WithLength(1, 2).HasNumeric()),
                    new PatternUnit(P.SingleOptional().WithTokens(colTokens)),
                    new PatternUnit(P.SingleOptional().WithLength(1, 2).HasNumeric())
                    )

i would able to detect this kinda pattern: a letter (A,B,C,D,E,F,G,H) 1, 2 numeric digits optionally "col" or "col." optionally 1, 2 numeric digits. example "A01", or "A01 col. 1", or "A01 col.1" or "A01 col 1", "A01 col. 1" i'm able to make it work, when is only the first patternunit so "A" but i'm not able to make it recognize "A01" even removing the optionals. where i mistake?

GalawynRM commented 3 years ago

I corrected the pattern

mp => mp.Add(
                        new PatternUnit(
                            P.And(
                                P.Single().WithChars(quadri.ToCharArray()),
                                P.Single().HasNumeric()
                                )
                        )
                        , new PatternUnit(P.SingleOptional().WithToken("col"))
                        , new PatternUnit(P.SingleOptional().HasNumeric())

quadri contains ABCDEFGH but now the number 100, that doesn't contain the characters, is recognized with this pattern

theolivenbaum commented 2 years ago

Hi @GalawynRM - is the issue still happening?

GalawynRM commented 2 years ago

Yes. I need to try to update to latest just in case

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.