explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
30.15k stars 4.4k forks source link

nlp.pipe() with multiple processes on Windows VSCode #13345

Open IliasAarab opened 8 months ago

IliasAarab commented 8 months ago

Trying to run nlp.pipe() with n_process > 1 results in the code hanging indefinitely on my Windows machine when running the code inside a VSCode interactive session. Running the same code from a VSCode Jupyter notebook instance works fine. Running the same code from a traditional Jupyter notebook works fine. Running the same code from a standard python interpreter works fine. Running the same code on Google Colab works fine.

How to reproduce the behaviour

import spacy
nlp = spacy.load("en_core_web_sm")
texts = ["one document to process"]  
results = list(nlp.pipe(texts, n_process=-1))

Your Environment

svlandeg commented 8 months ago

Hi! Thanks for the report.

That's weird, I wonder what the difference could be in the VSCode interactive session 🤔

IliasAarab commented 8 months ago

@svlandeg I don't know how nlp.pipe() works under the hood, but I tried executing some basic code in concurrent fashion using ThreadPoolExecutor and this seems to work fine within the interactive session. Let me know if I can provide more information. Would be great if someone can confirm the same issue on their own machine.