explosion / spaCy

馃挮 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
30.34k stars 4.41k forks source link

小/小++ free(): invalid next size (fast) #3970

Closed gorqkop closed 5 years ago

gorqkop commented 5 years ago

Spacy 2.1.1 - 2.1.6 Python 3.6 Ubuntu 18.04.1 LTS

In Spacy since 2.1 randomly occurs memory errors from 小/小++ like "free(): invalid next size (fast)" no any additional information are provided in logs. Has anybody meet the same problems?

honnibal commented 5 years ago

Do you have any more specifics about what text it fails on, what model is being used, and what pipeline components are in it?

gorqkop commented 5 years ago

Because I cann't give any logs, I'm trying to prepare code to share, which reproduce the problem.

gorqkop commented 5 years ago

After tests I realized that one of the way to see this problem working with 2 instances of Spacy. For example with custom lemmatizer based on Spacy instance:

class CustomLemmatizer(object):
    def __init__(self, nlp):
        self.nlp = nlp

    def lemmatize(self, token):
        """
        Here described lemmatization rules
        :param token: Spacy.Token
        :return: str
        """
        if ' ' in token.text:
            return token.lemma_.lower()
        if token.is_lower:
            return token.lemma_
        right_context = get_nbor(token, 'token', 1)
        right_context = right_context.text if right_context else ''
        new_token = self.nlp(' '.join([token.lower_, right_context]))[0]
        return new_token.lemma_

    def __call__(self, doc):
        """
        On call
        :param doc: Spacy.doc
        :return: Spacy.doc
        """
        for token in doc:
            token._.clemma = self.lemmatize(token)
        return doc

Than use it via custom attribute:

    """
    Initialising and manipulating with SpaCy language model
    """
    __NM = None

    @classmethod
    def load(cls):
        """Use it to load nlp model from memory."""
        if not cls.__NM:
            cls.__NM = cls()
        return cls.__NM

    def __init__(self):
        self.parser = spacy.load('en_core_web_lg')
        tmp_parser = spacy.load('en_core_web_lg')
        clemmatizer = CustomLemmatizer(tmp_parser)
        if not Token.get_extension('clemma'):
            Token.set_extension('clemma', getter=lambda token: clemmatizer.lemmatize(token))
        self.parser.max_length = 2000000

So if use NlpModel.parser Sapcy can occasionally raise errors like: corrupted double-linked list python3.6: malloc.c:4023: _int_malloc: Assertion `(unsigned long) (size) >= (unsigned long) (nb)' failed. free(): invalid next size (fast)

gorqkop commented 5 years ago

Sorry for bad code quotation, but git marks nicely only part of code in quotes

gorqkop commented 5 years ago

Fixed since Spacy 2.1.8

svlandeg commented 5 years ago

Happy to hear it, thanks for following up!

lock[bot] commented 5 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.