Closed rahonalab closed 4 years ago
@rahonalab has this started to happen just in the latest spacy-udpipe
version? I am asking because of the related issue #14
Honestly I don't know, because I have started analysing Portoguese with the latest spacy-udpipe version...
>>> import spacy_udpipe
>>> nlp=spacy_udpipe.load("pt-bosque")
>>> doc=nlp("no ar.")
>>> print(doc)
em o ar
It seems "no" (= "em o") cannot be handled correctly, and the period has gone away. Umm... Well @asajatovic, how do you handle multiword tokens?
# text = no ar.
1-2 no _ _ _ _ _ _ _ _
1 em em ADP _ _ 3 case _ _
2 o o DET _ _ 3 det _ _
3 ar ar NOUN _ _ 0 root _ SpaceAfter=No
4 . . PUNCT _ _ 3 punct _ SpaceAfter=No
But I'm vague whether PR #17 works well for other languages...
@KoichiYasuoka multiword tokens are treated as single word tokens. Far from an ideal solution, but no issues were raised until now.
Fixed in #17
Hi, I don't know if it is relevant here or should I address pt-bosque model developers, but I get the following error using the pt-bosque model i.e.
spacy_udpipe.load("pt-bosque")
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.7/site-packages/spacy/language.py", line 431, in __call__ doc = self.make_doc(text) File "/usr/local/lib/python3.7/site-packages/spacy/language.py", line 457, in make_doc return self.tokenizer(text) File "/usr/local/lib/python3.7/site-packages/spacy_udpipe/language.py", line 232, in __call__ raise e File "/usr/local/lib/python3.7/site-packages/spacy_udpipe/language.py", line 220, in __call__ spaces=spaces).from_array(attrs, array) File "doc.pyx", line 814, in spacy.tokens.doc.Doc.from_array ValueError: [E190] Token head out of range in
Doc.from_array()for token index '14' with value '27' (equivalent to relative head index: '27'). The head indices should be relative to the current token index rather than absolute indices in the array.
while analyzing the text:
text = "– Não sei. Harry olhou desesperado para os lados. Black e Lupin, os dois tinham se ido... não havia mais nenhum adulto em sua companhia exceto Snape, que ainda flutuava, inconsciente, no ar."
The error is not thrown up when using the 'default' Portoguese model, pt-gsd, which is loaded as 'pt', or spacy with its own Portoguese model.
Thank you in advance!