curiosity-ai / catalyst

🚀 Catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's design, it brings pre-trained models, out-of-the box support for training word and document embeddings, and flexible entity recognition models.
MIT License
742 stars 75 forks source link

Add support for Dutch and French WordNets #111

Open oktaal opened 5 months ago

oktaal commented 5 months ago

The current implementation of Catalyst only support Stanford WordNets for English. This adds support for mapping WordNets to other languages (using a new class WordNetMapping) and exposes the translations and the original English WordNet data using the uniform interfaces IWordNet and IWordNetData. The translations should follow the format as used by the Open Multilingual WordNet which maps each synset to one or multiple translations e.g. for cyclist:

09986189-n  nld:lemma   peddelaar
09986189-n  nld:lemma   fietser
09986189-n  nld:lemma   wielrenner

Related to #35

oktaal commented 3 months ago

Could it be possible for someone to have a look at this PR? @theolivenbaum? I also think it might be useful to move /Catalyst.WordNet.Test/ to the samples folder to clarify its contents, but that's not something I just want to tack onto this PR.