curiosity-ai / catalyst

🚀 Catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's design, it brings pre-trained models, out-of-the box support for training word and document embeddings, and flexible entity recognition models.
MIT License
699 stars 71 forks source link

How to Use Catalyst for Lemmatizing in a Multi-Language App with On-Demand Model Downloads #110

Open ramjke opened 2 months ago

ramjke commented 2 months ago

I'm working on an application that supports multiple languages chosen by the user. I want to integrate Catalyst for its lemmatizing capabilities. However, I've encountered a couple of challenges and I would appreciate your guidance on how to address them:

Challenges

  1. Preinstallation of NuGet Packages for Each Language: As per the documentation, it appears that I need to preinstall the NuGet package for each language I intend to support. Given the number of languages, this approach would lead to a significant increase in the bundle size of my application, which is not ideal.

  2. Using Only the Lemmatizing Feature: My primary need from Catalyst is the lemmatizing feature. I want to minimize the resources and dependencies required by my application by using only this specific functionality.

Questions

  1. On-Demand Model Downloads: Is there a way to implement Catalyst such that I can download language models on demand, based on the user's selected language? This would help in keeping the initial bundle size small and load models only when necessary.

  2. Minimal Usage for Lemmatizing: How can I configure Catalyst to use just enough resources for the lemmatizing feature? Are there any specific configurations or optimizations that I should be aware of to achieve this?

Use Case

Here is a brief outline of what I am trying to achieve:

Any advice, code snippets, or references to relevant parts of the documentation would be highly appreciated.