explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
30.21k stars 4.4k forks source link

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV) #1986

Closed damianoporta closed 6 years ago

damianoporta commented 6 years ago

Hello, I am training 5000 documents (500/700 tokens long) using minibatch

with nlp.disable_pipes(*other_pipes):  # only train NER
    optimizer = nlp.begin_training()
    for itn in range(n_iter):
        random.shuffle(TRAIN_DATA)
        losses = {}

        for batch in minibatch(TRAIN_DATA, size=32):
            docs, golds = zip(*batch)
            nlp.update(docs, golds, drop=.5, sgd=optimizer, losses=losses)
        print(losses)

After some epochs i get this error:

Created blank 'it' model
{'ner': 283.00376798654906}
{'ner': 100.42526869625726}
{'ner': 75.52572853059974}
{'ner': 61.201405786312534}
{'ner': 53.64163815265056}
{'ner': 47.689798961899214}
{'ner': 42.32929941194743}
{'ner': 39.0938703037973}
{'ner': 36.360852088015235}
{'ner': 34.037414559089484}
{'ner': 30.103532376321255}
{'ner': 29.008618737076176}
{'ner': 27.112722278790898}
{'ner': 25.40972332964293}
{'ner': 24.01783800058513}
{'ner': 22.882482178725695}
{'ner': 21.548512058101835}
{'ner': 21.084928312423585}
{'ner': 20.061247612083292}
{'ner': 19.41473015532688}

**Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)**

Your Environment

Models             en, it, en_core_web_md, en_core_web_sm
Python version     3.5.2          
Location           /home/damiano/lavoro/python/parser/.env/lib/python3.5/site-packages/spacy
Platform           Linux-4.4.0-112-generic-x86_64-with-Ubuntu-16.04-xenial
spaCy version      2.0.6.dev0 
fiskio commented 6 years ago

Same here, have you found the cause?

damianoporta commented 6 years ago

hi @fiskio not yet. It only works if i decrease the number of the sentences. Fortunately the model supports online training so i use 4k sentences each time.

wpm commented 6 years ago

This is happening intermittently to me too. I have 1000 samples. Mean # of tokens 155. Max # of tokens 1011. Min # of tokens 11.

honnibal commented 6 years ago

This should be fixed now: https://github.com/explosion/spaCy/commit/ad068f51be6a2579c19c35bcd8b5c1767441ffc1

Thanks for your patience with this long-standing bug! Was very difficult to track it down, as the out-of-bounds read didn't seem to cause errors on Linux, only on Windows and OSX.

lock[bot] commented 5 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.