curiosity-ai / catalyst

🚀 Catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's design, it brings pre-trained models, out-of-the box support for training word and document embeddings, and flexible entity recognition models.
MIT License
699 stars 71 forks source link

Package FastText Language detection model as nuget package #63

Open theolivenbaum opened 2 years ago

theolivenbaum commented 2 years ago

Since the online model repository has been deprecated, we need to publish a nuget package for this model.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

wis3guy commented 2 years ago

What is the current recommendation for using the FastTextLanguageDetector? Where would I find the model to store locally? Should I wait for the nuget package? The sample code is not working as it uses Storage.Current = new DiskStorage("catalyst-models");, which (i think) assumes I already have the models in a certain folder.

diegosasw commented 2 years ago

I'm also interested. It fails when

English.Register();
Storage.Current = new DiskStorage("catalyst-models");
var fastTextLanguageDetector = await FastTextLanguageDetector.FromStoreAsync(Language.Any, Version.Latest, "");

with a

System.IO.FileNotFoundException
Unable to find the specified file.
   at Mosaik.Core.DiskStorage.OpenLockedStreamAsync(String path, FileAccess access)
   at Mosaik.Core.ObjectStore.LoadAsync[T](IStorageTarget storeTarget, Language language, String modelType, Int32 version, String tag, Boolean compress)
   at Mosaik.Core.ObjectStorage.LoadInternal[TData](IStorageTarget target, String name, Language Language, Int32 Version, String Tag, Boolean CompressStoredData)
   at Mosaik.Core.ObjectStorage.LoadAsync[TData](IStorageTarget target, Language language, Int32 version, String tag, Boolean compress)
   at Mosaik.Core.ObjectStorage.LoadAsync[TData](IStorageTarget target, Language language, Int32 version, String tag, Boolean compress)
   at Mosaik.Core.StorableObject`2.LoadDataAsync()
   at Catalyst.Models.FastText.FromStoreAsync_Internal(Language language, Int32 version, String tag)
   at Catalyst.Models.FastTextLanguageDetector.FromStoreAsync(Language language, Int32 version, String tag)

If I have a look at catalyst-models I see a bunch of empty folders. Is it expected to have something there?

UPDATED: I can see the var cld2LanguageDetector = await Catalyst.Models.LanguageDetector.FromStoreAsync(Language.Any, Version.Latest, ""); works well so I'm guessing the problem with fast text language detector is not yet solved.

aggiehorns commented 2 years ago

Bump. I'm experiencing this same error. The samples don't function. Would be helpful to know what files are needed and where they can be downloaded, totally lost...

dylanvdmerwe commented 1 year ago

Also totally lost with regards to using the new language packages. Any input on how to get this working as FastTextLanguageDetector cannot pull from storage.

gabe4797 commented 1 year ago

Is there a plan to get this working?