Open davgargar opened 5 months ago
The issue might be due these reasons:-
Model Training Data: spaCy's pre-trained models are trained on specific datasets. If certain entities or terms were not sufficiently represented in the training data, the model might not recognize them as entities.
Model Limitations: Every model has its limitations. The pre-trained models may not always capture all entities accurately.
Language Model:- The performance of entity recognition can vary between different language models. For example, the es_core_news_lg
and it_core_news_lg
models are specifically trained for Spanish and Italian, respectively. If the entities you are trying to extract are domain-specific or less common, these models might not perform well.
To solve the issue you may try these steps and let me know, if it works::-
import spacy
from spacy.pipeline import EntityRuler
nlp = spacy.load("es_core_news_lg") # or "it_core_news_lg"
ruler = EntityRuler(nlp, overwrite_ents=True) patterns = [ {"label": "ORG", "pattern": "OpenAI"}, {"label": "PRODUCT", "pattern": "ChatGPT"},
] ruler.add_patterns(patterns)
nlp.add_pipe(ruler, before="ner")
doc = nlp("OpenAI has developed ChatGPT.")
for ent in doc.ents: print(ent.text, ent.label_)
Hope this helps, Thanks
Extracting entities from news articles I've realized this behavior:
These words are present in articles but are not extracted by the models.
Does anyone know the reason?
Info about spaCy