eellak / gsoc2018-spacy

[GSOC] Greek language support for spacy.io python NLP software
http://nlpbuddy.io/gsoc
MIT License
95 stars 9 forks source link

Sentence splitter not working properly affecting part of speech tagger #7

Open dkatsiros opened 5 years ago

dkatsiros commented 5 years ago

Problem

I tried to run the sentence splitter submodule (sentence_splitter.py) but it didn't work in Greek language for me. I tried loading both _el_core_newssm and _el_core_newsmd and also tried inserting and encoding text in unicode utf-8. However it does not recognize different sentences but sees them as one. At the same time this affects the part of speech tagger. Do you have any idea what might the problem be?

Thanks in advance.

Environment

spaCy version: 2.1.4
Location: /home/dimitris/.local/lib/python3.6/site-packages/spacy Platform: Linux-4.18.0-17-generic-x86_64-with-Ubuntu-18.04-bionic Python version: 3.6.7
Models: el, en

giannisdaras commented 5 years ago

Hello, thanks for reporting this! Could you please tell from where did you download the models? Are you using spacy-nightly?

dkatsiros commented 5 years ago

No, I downloaded the models from https://spacy.io/models/el . Should I use spacy-nightly?

giannisdaras commented 5 years ago

Could you try uninstall spacy, install spacy-nightly, download the models through nightly and then check again? Sorry for the trouble, I need to check if it is a version problem.

dkatsiros commented 5 years ago

I tried but I faced the same problem . In order to install models through spacy-nightly I used: python3 -m spacy install el_core_news_md . Is that correct? Any other suggestion on something that I may did wrong?

PanosAntoniadis commented 5 years ago

I am facing the same problem after trying both.