curiosity-ai / catalyst

🚀 Catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's design, it brings pre-trained models, out-of-the box support for training word and document embeddings, and flexible entity recognition models.
MIT License
699 stars 71 forks source link

Is there an example on how to use the lemmatizer in a pipeline? #101

Open bancroftway opened 9 months ago

bancroftway commented 9 months ago

Could you please document how to use the lemmatizer in a pipeline? I am unable to find any sample code in the samples directory on this.

flor3sc0 commented 9 months ago

You can try the following code:

Catalyst.Models.English.Register();
var nlp = await Pipeline.ForAsync(Language.English);
var doc = new Document("I used to have dogs", Language.English);
nlp.ProcessSingle(doc);
var tokenList = doc.ToTokenList();
tokenList.ForEach(token => Console.WriteLine($"{token.Value} -> {token.Lemma}"));

/* 
Result:
  I -> I
  used -> use
  to -> to
  have -> have
  dogs -> dog
*/

But check the presence of the *_lemma_lookup_*.bin file and the ILemmatizer implementation for the language you need.