curiosity-ai / catalyst

🚀 Catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's design, it brings pre-trained models, out-of-the box support for training word and document embeddings, and flexible entity recognition models.
MIT License
715 stars 73 forks source link

Allow loading models from a given stream #40

Closed BernhardGlueck closed 3 years ago

BernhardGlueck commented 4 years ago

Hi !

Right now loading models for the various algorithms is handled via the IStorage interface, which handles versioning etc.. However a common case at least for us is that we already have a storage abstraction in place, which cannot be used to implement the full IStorage interface ( so bridging is a non option )

For reading only would it be possible to add the option to read models from a stream directly ? Versioning etc would be handled on our side then..

BernhardGlueck commented 4 years ago

Another issue that came up is, that the whole implicity usage of a global singleton IStorage interface is very bad at least in a server scenario ... Is there a way to load some models from one IStorage and others from a different one ?

theolivenbaum commented 3 years ago

Absolutely agreed - it's something that has been on my backlog for a while, also to enable removing the external dependency on our Mosaik.Core NuGet package.

For the singleton issue - I need to see what would be the cleanest way to change it - might be possible to just pass an optional IStorage implementation to the methods that load models from storage. I'm also tempted to move the base models distribution to use NuGet packages - what do you think?

theolivenbaum commented 3 years ago

@BernhardGlueck finally got some time to work on this. Most models now support a LoadAsync and StoreAsync. There is still a couple pending but will get back to it soon.

I'm also switching our model distribution to nuget, and splitting it by language - will update the samples soon enough with the new code!