egerber / spaCy-entity-linker

spaCy module for linking text to Wikidata items
MIT License
215 stars 32 forks source link

version of wiki data and how to update database? #2

Closed therealhieu closed 3 years ago

therealhieu commented 3 years ago

I want to know what version of wiki data used for database and keep it up to date. Can you guide me?

aihamburg commented 3 years ago

I'd also would be interested in that how to, with the additional question: What would I have to do to make the entity linker work with the German or French version of Wikipedia/wikidata? Cheers Werner

ryandury commented 3 years ago

I would also be interested in how the data is processed. Trying to interpret how the SQLite database is structured.. It looks like you are using specific IDs to designate the children and parents by referencing statements (i.e. 31: "instance of" or 279: "subclass of:"). I take it you have left out other statement relationships in the data to keep it light?

It would be great to get a script that processes the raw wikidata dumps in this structure.

Thanks for sharing, this is great!

egerber commented 3 years ago

Apologies for keeping your questions unanswered for so long. I did not work on the repo for a while.

In case you are still interested, I added the link to the source dataset that I used in the project description (https://www.kaggle.com/kenshoresearch/kensho-derived-wikimedia-data).

I did a number of post processing steps to this dataset, mainly to filter out entities that you most likely won't need for general purpose applications (villages in China, stars in the galaxy, train stations in Russia, etc.) I did this on the fly, so I am afraid I don't have a script for this.

egerber commented 3 years ago

@ryandury I added a short description of the database schema. I decided to use keep the 3 statements from the original knowledge base (31=instance of, 279=subclass of, 361=part of) in order to cover the hierarchical relationships between entities. Hope that helps, in case you are still interested...