Abhijit-2592 / spacy-langdetect

A fully customisable language detection pipeline for spaCy
MIT License
93 stars 6 forks source link

Spacy V3 decorator string name #6

Open rennanvoa2 opened 3 years ago

rennanvoa2 commented 3 years ago

Hello guys, With the V3 update when I run the example code it complains:

ValueError: [E966] `nlp.add_pipe` now takes the string name of the registered component factory, not a callable component. Expected string, but got <spacy_cld.spacy_cld.LanguageDetector object at 0x7fb8d9051ed0> (name: 'None').

- If you created your component with `nlp.create_pipe('name')`: remove nlp.create_pipe and call `nlp.add_pipe('name')` instead.

- If you passed in a component like `TextCategorizer()`: call `nlp.add_pipe` with the string name instead, e.g. `nlp.add_pipe('textcat')`.

- If you're using a custom component: Add the decorator `@Language.component` (for function components) or `@Language.factory` (for class components / factories) to your custom component and assign it a name, e.g. `@Language.component('your_name')`. You can then run `nlp.add_pipe('your_name')` to add it to the pipeline.

I figured out that now we have to pass the string name, to nlp.add_pipe but how?

I've tried nlp.add_pipe("langdetect"), nlp.add_pipe("LanguageDetector"),nlp.add_pipe("languagedetector") and none of them seems to work.

Can you help me with this ?

Cusard commented 3 years ago

Hi,

Since I'm new to SpaCy and Python, I'm not sure if this is the correct way to implement it. For Python 3.9 with SpaCy 3.0.3 the following worked for me:

import spacy
from spacy.language import Language
from spacy_langdetect import LanguageDetector

# Add LanguageDetector and assign it a string name
@Language.factory("language_detector")
def create_language_detector(nlp, name):
    return LanguageDetector(language_detection_function=None)

# Use a blank Pipeline, also a model can be used, e.g. nlp = spacy.load("en_core_web_sm")
nlp = spacy.blank("en")

# Add sentencizer for longer text
nlp.add_pipe('sentencizer')

# Add components using their string names
nlp.add_pipe("language_detector")

# Analyze components and their attributes
text = "This is an English text."
doc = nlp(text)

# Document level language detection.
print(doc._.language)

# See what happened to the pipes
nlp.analyze_pipes(pretty=True)`

I got on this track with: Language-specific pipeline

Is this the right way to use it with SpaCy3?

How to use the result for language specific processing? Do I have to load language specific models, e.g. nlp_en = spacy.load("en_core_web_sm") and nlp_de = spacy.load("de_core_news_sm")?

Many thanks and best regards,

Cusard

renatojmsantos commented 3 years ago

same problem

FelixSiegfriedRiedel commented 3 years ago

Hello everybody! Thanks to @Cusard I got the example code to work with the current spacy version.

import spacy
from spacy.language import Language
from spacy_langdetect import LanguageDetector

@Language.factory("language_detector")
def create_language_detector(nlp, name):
    return LanguageDetector(language_detection_function=None)

nlp = spacy.load("en_core_web_sm")

nlp.add_pipe('language_detector')
text = 'This is an english text.'
doc = nlp(text)
# document level language detection. Think of it like average language of the document!
print(doc._.language)
# sentence level language detection
for sent in doc.sents:
   print(sent, sent._.language)

The output looks like this:

{'language': 'en', 'score': 0.9999983570159962}
This is an english text. {'language': 'en', 'score': 0.9999956329695125}
luis-possatti commented 3 years ago

Thanks for sharing the solution. It worked for me too.

It will be nice if the project home page had the example update: https://spacy.io/universe/project/spacy-langdetect

benjlis commented 2 years ago

The example provided by @FelixSiegfriedRiedel works for me with v3.3.

I've also raised an issue about updating the documentation: https://github.com/explosion/spaCy/issues/11038