Closed gorqkop closed 5 years ago
Do you have any more specifics about what text it fails on, what model is being used, and what pipeline components are in it?
Because I cann't give any logs, I'm trying to prepare code to share, which reproduce the problem.
After tests I realized that one of the way to see this problem working with 2 instances of Spacy. For example with custom lemmatizer based on Spacy instance:
class CustomLemmatizer(object):
def __init__(self, nlp):
self.nlp = nlp
def lemmatize(self, token):
"""
Here described lemmatization rules
:param token: Spacy.Token
:return: str
"""
if ' ' in token.text:
return token.lemma_.lower()
if token.is_lower:
return token.lemma_
right_context = get_nbor(token, 'token', 1)
right_context = right_context.text if right_context else ''
new_token = self.nlp(' '.join([token.lower_, right_context]))[0]
return new_token.lemma_
def __call__(self, doc):
"""
On call
:param doc: Spacy.doc
:return: Spacy.doc
"""
for token in doc:
token._.clemma = self.lemmatize(token)
return doc
Than use it via custom attribute:
"""
Initialising and manipulating with SpaCy language model
"""
__NM = None
@classmethod
def load(cls):
"""Use it to load nlp model from memory."""
if not cls.__NM:
cls.__NM = cls()
return cls.__NM
def __init__(self):
self.parser = spacy.load('en_core_web_lg')
tmp_parser = spacy.load('en_core_web_lg')
clemmatizer = CustomLemmatizer(tmp_parser)
if not Token.get_extension('clemma'):
Token.set_extension('clemma', getter=lambda token: clemmatizer.lemmatize(token))
self.parser.max_length = 2000000
So if use NlpModel.parser Sapcy can occasionally raise errors like: corrupted double-linked list python3.6: malloc.c:4023: _int_malloc: Assertion `(unsigned long) (size) >= (unsigned long) (nb)' failed. free(): invalid next size (fast)
Sorry for bad code quotation, but git marks nicely only part of code in quotes
Fixed since Spacy 2.1.8
Happy to hear it, thanks for following up!
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Spacy 2.1.1 - 2.1.6 Python 3.6 Ubuntu 18.04.1 LTS
In Spacy since 2.1 randomly occurs memory errors from 小/小++ like "free(): invalid next size (fast)" no any additional information are provided in logs. Has anybody meet the same problems?