explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
29.63k stars 4.36k forks source link

Is there an equivalent of nlp.pipe supporting multiple languages/model? #5084

Closed thomasthiebaud closed 4 years ago

thomasthiebaud commented 4 years ago

Is it possible to support multiple language with nlp.pipe.

For example

If I'm receiving a list of text and want to process each element by a different model, I have to do something like that

texts = [...]
en_text = texts.filter(...)
fr_text = texts.filter(...)

en_nlp = spacy.blanck('en')
en_nlp.add_pipe(...)
fr_nlp = spacy.blanck('fr')

for doc in en_nlp.pipe(en_text):
  # Do something

for doc in fr_nlp.pipe(fr_text):
  # Do the same thing

Is there a more idiomatic way to write that? Something like spacy.pipes? Or is this the way to go? Could it be supported by spacy?

svlandeg commented 4 years ago

In general, an nlp object is a processing pipeline specific to one language. Its components (syntax parsing or named entity recognition for instance) would also be language-dependent. So I don't see a straightforward way to accomplish what you'd want. nlp.pipe can't support multiple languages at the same time, as inherently nlp only supports one at a time.

I think preprocessing and filtering the texts, as in your example snippet, would be the best way to go.

lock[bot] commented 4 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.