Open balmas opened 3 years ago
@balmas - I spend all day to add these libraries - and here it is my results: I was able to add
And faced with problems for japanese and korean
Japanese needs Unidic, Mecab, SudachiPy
I was able to find versions for our environment - Unidic, Mecab
But I didn't find a working version for SudachiPy to work with Cython
And was not able to install all the requirements for - flake8
flake8-import-order
flake8-bulitins
There is a compiled library with SudachiPy and Cython - https://github.com/polm/fugashi But spacy requires sudacypy module (from the error)
Korean needs mecab-ko, mecab-ko-dic, natto-py
I was able to install natto-py but failed with - mecab-ko, mecab-ko-dic They failed with specific errors
I could continue with it tomorrow - it is really difficult to build the container on my evenning/night - it needs much more time. I hope the traffic of docker resources will reduce on my morning
@balmas , how do you think how much time it is worth to spend for Koreen and Japaneese support?
@balmas , how do you think how much time it is worth to spend for Koreen and Japaneese support?
@irina060981 let's not worry about those for the moment. Thanks.
Also, Telugu and Sanskrit also give a 500 error. see attachment
31 identified a tokenizer error with Chinese due to a missing dependency.
Spacy documentation lists additional dependencies for a number of languages at https://spacy.io/usage/models#languages:
Japanese: Unidic, Mecab, SudachiPy Russian: pymorphy2 Ukrainian: pymorphy2 Thai: pythainlp Korean: mecab-ko, mecab-ko-dic, natto-py Vietnamese: Pyvi
@irina060981 if you can confirm the chinese fix works (and the Dockerfile fix too) maybe you can add these dependencies too?