BlueBrain / Search

Blue Brain text mining toolbox for semantic search and structured information extraction
https://blue-brain-search.readthedocs.io
GNU Lesser General Public License v3.0
42 stars 11 forks source link

Package the NER models #313

Open pafonta opened 3 years ago

pafonta commented 3 years ago

🚀 Feature

Package the NER models we trained.

Motivation

Make the NER models pip installable and easily distributable.

Pitch

As we track the models with DVC, we could retrieve them if needed.

However, we might want or need to distribute our models in a packaged form.

Besides, packaging a model would let us distribute with it registered functions and custom components (EntityRuler?).

This issue is a reminder to have this discussion.

Additional context

Reference: https://spacy.io/api/cli#package.

FrancescoCasalegno commented 3 years ago

However, we might want or need to distribute our models in a packaged form.

Currently a spacy pipeline is loaded with a very easy spacy.load() — and this also include the EntityRuler component.

Unless at some point we should have registered functions, is there really an strong benefit from having a model that is pip installable?

pafonta commented 3 years ago

this also include the EntityRuler component

There are 2 pipelines for each modelX. One in data_and_models/models/ner/. One in data_and_models/models/ner_er/. So the EntityRuler is loaded only if one uses the 2nd directory with spacy.load(). Just to clarify that having the EntityRuler loaded is another discussion than packaging the model or not. Or had you something else in mind?

Unless at some point we should have registered functions

That's indeed a case where packaging models would be handy.

is there really an strong benefit from having a model that is pip installable?

I think about 4 benefits:

  1. distribute custom architectures,
  2. distribute custom functions,
  3. distribute custom components,
  4. have to handle only 1 file (i.e. the packaged model) instead of all the directories and files for a model.
Stannislav commented 3 years ago

Just a note: custom architectures can be distributed as python packages that plug into spacy via entrypoints.

Documentation: https://spacy.io/usage/saving-loading#entry-points-components

Example: spacy-transformers, see these lines