egerber / spaCy-entity-linker

spaCy module for linking text to Wikidata items
MIT License
215 stars 32 forks source link

Changing the dataset #29

Open neel-forwardedge opened 1 year ago

neel-forwardedge commented 1 year ago

I'm trying to tweak the dataset to use my own data for a use case, but the model keeps on pointing to the original dataset somehow. Do I have to clone the repo and upload the model to pip?

MartinoMensio commented 1 year ago

Hi @neel-forwardedge , Have you tried changing the DB_DEFAULT_PATH here? https://github.com/egerber/spaCy-entity-linker/blob/d2f24731248c261648d955b9f48123589b5257eb/spacy_entity_linker/DatabaseConnection.py#L16

At the moment, the download_knowledge_base method is downloading and unzipping the default file_url at https://github.com/egerber/spaCy-entity-linker/blob/d2f24731248c261648d955b9f48123589b5257eb/spacy_entity_linker/__main__.py#L19

You can choose to upload your model to any place that allows big files. We opted in #12 for Hugging Face LFS because it is free also for huge files. You can alternatively link to LFS in other places (e.g. GitHub) or in a online bucket somewhere else, but I think that uploading such a huge file to pypi is somewhat inconvenient.

Best, Martino

neel-forwardedge commented 1 year ago

I changed the DB_DEFAULT_PATH. I set it to the new file that I created, but it's still running the old file somehow.