TakeLab / spacy-udpipe

spaCy + UDPipe
MIT License
159 stars 11 forks source link

get_path - allow to load from disk instead #1

Closed jwijffels closed 4 years ago

jwijffels commented 4 years ago

Would it be possible to allow the language to be the path to the file on disk? Currently this https://github.com/TakeLab/spacy-udpipe/blob/master/spacy_udpipe/language.py#L195 only allows to set a language specified in that json I have some home-brew udpipe models (e.g. some of them are here: https://github.com/bnosac/udpipe.models.ud, which I basically use alongside the R wrapper of udpipe https://github.com/bnosac/udpipe) and it would be nice to allow them to be read from the location of the file on disk instead.

asajatovic commented 4 years ago

A quick-and-dirty way of doing it would be to first load the supported model for the same language (e.g. for English):

nlp = spacy_udpipe.load('en')

and then (re)load the custom underlying UDPipe model from the disk path:

from ufal.udpipe import Model

my_model = Model.load(path)
if my_model:
    nlp.udpipe.model = my_model
jwijffels commented 4 years ago

That's a bit silly :). 2 times loading that .udpipe file. Can't we just do

    def __init__(self, lang, path=None):
        """Load UDPipe model for given language.
        lang (unicode): ISO 639-1 language code or shorthand UDPipe model name.
        RETURNS (UDPipeModel): Language specific UDPipeModel.
        """
        if path is not None:
            path = get_path(lang)

        self.model = Model.load(path)
asajatovic commented 4 years ago

I added that option in the pull request #2, along with a convenience function load_from_path. It is also available in a new release on PyPI.

jwijffels commented 4 years ago

Great! Thank you.