Inconsistent NER predictions from identical inputs while using ThreadPoolExecutor

explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python

MIT License

30.25k stars 4.4k forks source link

When running data through the en_core_web_trf model concurrently I am getting different results between runs. I cannot find anywhere in the documentation or other github issues where this behaviour is explained.

The below code reproduces the behaviour, If I don't run data through the pipeline concurrently (e.g. setting max_workers=1) I find the result to always be consistent.

import spacy
from concurrent.futures import ThreadPoolExecutor

nlp = spacy.load("en_core_web_trf")

def extract_entities(sentences):
    with ThreadPoolExecutor(max_workers=4) as e:
        submitted = [e.submit(call_spacy, sent) for sent in sentences]
        resolved = [item.result() for item in submitted]

        return resolved

def call_spacy(sent):
    result = nlp(sent)
    return result.ents

input =[
    "CoCo Town also known as the Collective Commerce District or more simply as the Coco District was a dilapidated industrial area of the planet Coruscant.",
    "It was also the site of Dexs Diner a local eatery owned by Dexter Jettster during the Republic Era.",
    "Hard working laborers visited CoCo Town to congregate at the diner.",
    "During the Galactic Civil War the Galactic Empire and the New Republic fought for control of the region.",
    "Many orphans from the area formed the Anklebiter Brigade and fought alongside the rebels sabotaging the Empire wherever possible."
]

for i in range(10):
    result = extract_entities(input)
    print(sum([len(x) for x in result]))

Your Environment

Operating System: Amazon Linux 2 Kernel: Linux 4.14.294-220.533.amzn2.x86_64
Python Version Used: python 3.7.10
spaCy Version Used: 3.1.3
Environment Information: en-core-web-trf==3.1.0

import spacy import torch torch.set_num_threads(1) nlp = spacy.load("en_core_web_trf") input =[ "CoCo Town also known as the Collective Commerce District or more simply as the Coco District was a dilapidated industrial area of the planet Coruscant.", "It was also the site of Dexs Diner a local eatery owned by Dexter Jettster during the Republic Era.", "Hard working laborers visited CoCo Town to congregate at the diner.", "During the Galactic Civil War the Galactic Empire and the New Republic fought for control of the region.", "Many orphans from the area formed the Anklebiter Brigade and fought alongside the rebels sabotaging the Empire wherever possible." ] for i in range(10): print(sum(len(doc.ents) for doc in nlp.pipe(input, n_process=4)))

explosion / spaCy

Inconsistent NER predictions from identical inputs while using ThreadPoolExecutor #11868

Your Environment