egerber / spaCy-entity-linker

spaCy module for linking text to Wikidata items
MIT License
215 stars 32 forks source link

downloaded wikidataset not connected or library not found #14

Closed defreeze closed 1 year ago

defreeze commented 1 year ago

When I use the following code: `# pip install spacy-entity-linker

python -m spacy_entity_linker "download_knowledge_base"

import spacy nlp = spacy.load("en_core_web_md") nlp.add_pipe("entity_linker", last=True) doc = nlp("I watched the Pirates of the Caribbean last silvester") all_linkedentities = doc..linkedEntities

for sent in doc.sents: sent._.linkedEntities.pretty_print()`

I get: 'ValueError: [E139] Knowledge base for component 'entity_linker' is empty. Use the methods kb.add_entity and kb.add_alias to add entries.' I might need to add the downloaded KG somewhere but it is nowhere stated.

The original code states that add.pipe should be: nlp.add_pipe("entity_linker", last=True)

But then i get the error: ' ValueError: [E002] Can't find factory for 'entityLinker' for language English (en). This usually happens when spaCy calls nlp.create_pipe with a custom component name that's not registered on the current language class'

Where are things going wrong?

MartinoMensio commented 1 year ago

Hi @defreeze , Thank you for your interest in this package. You are doing everything right, except for one small detail:

When you add the pipeline component, you are invoking entity_linker instead of entityLinker:

Maybe the name of this pipeline component should be made more distinguishable to avoid confusion.

Let me know if this solves your issue!

Martino

defreeze commented 1 year ago

thanks for the response! I made a typo in my question here:

The original code states that add.pipe should be: nlp.add_pipe("entityLinker", last=True)

But then i get the error: ValueError: [E002] Can't find factory for 'entityLinker' for language English (en). This usually happens when spaCy calls nlp.create_pipe with a custom component name that's not registered on the current language class

So i did try the correct naming and i got the above error. This happened for 2 days. Now after your comment i tried again, and it actually works now!? No idea why, i made no changes but i do not get the error.

Ill try to see if the error happens again and then update this threat. Thanks for the answer, it does seem to work correctly now.

MartinoMensio commented 1 year ago

@defreeze , You're welcome! Maybe you were in the wrong python environment. For that line to work, the package needs to be installed and updated in the current environment. Let me know if you have further issues :)

Martino

MartinoMensio commented 1 year ago

With the updated version v1.0.3, PR #17 was merged. As a result, the database will be automatically downloaded if it is not found.

I am closing now this issue as it is fixed.

Best, Martino

dersuchendee commented 1 year ago

I actually still have this problem!

I'm running on colab:

import spacy
!pip install spacy-entity-linker
!python -m spacy_entity_linker "download_knowledge_base"
!python -m spacy download en_core_web_md 

import spacy  # version 3.5

# initialize language model
nlp = spacy.load("en_core_web_md")

# add pipeline (declared through entry_points in setup.py)
nlp.add_pipe("entityLinker", last=True)
And get:

ValueError: [E002] Can't find factory for 'entityLinker' for language English (en). This usually happens when spaCy calls `nlp.create_pipe` with a custom component name that's not registered on the current language class. If you're using a Transformer, make sure to install 'spacy-transformers'. If you're using a custom component, make sure you've added the decorator `@Language.component` (for function components) or `@Language.factory` (for class components).

Available factories: attribute_ruler, tok2vec, merge_noun_chunks, merge_entities, merge_subtokens, token_splitter, doc_cleaner, parser, beam_parser, lemmatizer, trainable_lemmatizer, entity_linker, ner, beam_ner, entity_ruler, tagger, morphologizer, senter, sentencizer, textcat, spancat, spancat_singlelabel, future_entity_ruler, span_ruler, textcat_multilabel, en.lemmatizer
namratanwani commented 6 months ago

@dersuchendee restarting the kernel helps