explosion / spacy-streamlit

👑 spaCy building blocks and visualizers for Streamlit apps
https://share.streamlit.io/ines/spacy-streamlit-demo/master/app.py
MIT License
804 stars 115 forks source link

Be able to load spacy.lang.en.English model (and more) #12

Closed NixBiks closed 3 years ago

NixBiks commented 3 years ago

I have a pipeline that builds on spacy.lang.en.English. I replace the tokenizer and add some custom components. Now spacy_streamlit uses spacy.load to load models. Is it possible to register my pipeline and be loadable via spacy.load?

I am aware that I can do nlp.to_disk on spacy.lang.en.English with my replaced tokenizer and that I can register my components using entry_points but I'd rather not have to do nlp.to_disk (e.g. shouldn't keep that in my git repo and it seems uneccesary!?).

Another alternative is to make spacy.lang.en.English with my replaced tokenizer as its own language and add that to entry_points but it feels kinda wrong and then I wouldn't be able to get the lexeme normalization table from spacy-lookups-data.

I hope it makes sense.

NixBiks commented 3 years ago

I just realized that I just have to implement a load method in the root of my package, e.g.

from typing import Iterable

def load(vocab: bool, disable: Iterable[str], exclude: Iterable[str], config):
    from spacy.lang.en import English

    return English()