How to create our own model?

curiosity-ai / catalyst

🚀 Catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's design, it brings pre-trained models, out-of-the box support for training word and document embeddings, and flexible entity recognition models.

MIT License

715 stars 73 forks source link

Is your feature request related to a problem? Please describe. My enterprise is considering using your great library to analyze texts. We're talking care home environment, just to clarify. So I was wondering -and can't see anywhere- how could we create new types of tags, like for instance "meds", or modify/increase others, like adding to locations "room", "toilet", and so on.

Describe the solution you'd like An explanation on how to create and expand the tagging dicts.

After carefully reading issue 45, closely related, I get a few points:

No model retraining, it's better to increase the dataset and retrain from zero.
How to train a a model ( code here: https://github.com/curiosity-ai/catalyst/blob/master/Catalyst.Training/src/TrainWikiNER.cs )
Models are trained in different ways.

So what I am asking for, actually, is a general guide to train a model: what method to use, where to get datasets, how to store them locally or create NuGet package (ok, that last thing is probably out of scope of Catalyst).

curiosity-ai / catalyst

How to create our own model? #59