Closed andyweizhao closed 3 years ago
Hi!
Please upgrade spacy-udpipe
to at least v.0.2.1
(preferably the latest version).
Also, note that spacy_udpipe.load
returns an instance of UDPipeLanguage
class that follows the spacy.Language
API. Once called, that instance returns a single Doc
object.
Hi
I have the same issue sometimes. I have spacy 2.2.4 and spacy-udpipe 0.3.0.
Can it be related to the size of the data?
Cheers, Dimitar
Hi Dimitar,
It would be very helpful if you could provide a code snippet and the input text that cause this issue.
Cheers, Antonio
Hi Antonio,
Following is a snippet.
...
docs = list(nlpD.pipe(sentences, n_process=-1))
with open(system_name + ".spacy_udpipe.model", "wb") as SpUpM:
pickle.dump(docs, SpUpM)
print("Model built from scratch")
nlps = []
[nlps.extend(doc) for doc in docs]
lemmas = {}
for token in nlps:
lemma=token.lemma_
...
Unfortunately I cannot share the text as it is around 2M sentences and is more than 250MB (utf-8 encoded).
Thanks a lot for looking into this. Kind regards, Dimitar
Hi,
Sometimes it's annoying when some bugs are not actually bugs... but when you combine multiple tools together and have large data sets... and use multithreading ...
I used spacy_udpipe with no multiprocessing, displaying all sentences and their indexes. And identified where the problem was - an empty line. It causes spacy to crash.
I should have thought earlier, but as I said ... too many things were going on and I overlooked this.
Hope this helps. And maybe somewhere there should be a try/except.
I used sentences = [s for s in ifh.readlines() if s]
Now it works just fine with multiprocessing too :+1:
@andyweizhao can you check if that is the case for you too?
Cheers, Dimitar
Hi, I encountered an error when running the following codes: s="ou visitez la page mondiale d'accueil Wide Web du GAO à" udpipe = spacy_udpipe.load('fr') tokens, feats = udpipe(s)
My environment is: spacy (2.2.4) spacy-udpipe (0.1.0)
Thanks!