Data4Democracy / internal-displacement

Studying news events and internal displacement.
43 stars 27 forks source link

Spacy model #140

Closed WanderingStar closed 7 years ago

WanderingStar commented 7 years ago

Work in Progress

WanderingStar commented 7 years ago

This successfully downloads and installs the model, but when we try to use the model in Jupyter, the kernel crashes:

import spacy
nlp = spacy.load("en_default")

doc = nlp("Husbands ask repeated resolved but laughter debating. She end cordial visitor noisier fat subject general picture.")
for s in doc.sents:
    print(s)
jupyter_1  | WARNING:root:kernel 4b5bc2e9-712a-410a-b684-e2d88cb72e46 restarted
simonb83 commented 7 years ago

I rebuilt this morning with the full model and do not have any issues running the code:

doc = nlp("Husbands ask repeated resolved but laughter debating. She end cordial visitor noisier fat subject general picture.")
for s in doc.sents:
    print(s)
Husbands ask repeated resolved but laughter debating.
She end cordial visitor noisier fat subject general picture.
simonb83 commented 7 years ago

Update: I rebuilt after including an additional module in requirements.txt, and now am also getting kernel restart errors, although I am not able to consistently reproduce them. Sometimes it is when running spacy.load("en_default") but other times I can load the model but the kernel restarts while running subsequent cells.

I also tried importing the model as a module, but the outcome was the same:

import en_core_web_md
nlp = en_core_web_md.load()
jupyter_1  | [I 19:31:23.622 NotebookApp] KernelRestarter: restarting kernel (1/5)
WanderingStar commented 7 years ago

It turned out that the kernel errors were due to running out of memory. I increased the setting in the Docker preferences to 8GB and it's fine.