egerber / spaCy-entity-linker

spaCy module for linking text to Wikidata items
MIT License
215 stars 32 forks source link

Translating the database? #24

Open supersambo opened 1 year ago

supersambo commented 1 year ago

First of all, thanks for this great library!

As the title suggests I'm wondering whether it would possible to port this to other natural languages by translating the database using wikidata requests. I had a look at the database and from my very limited understanding of this, I would just translate en_label, en_description (in joined) and rebuild the aliases table based on the "also known as" field in wikidata.

While this seems technically feasible, it is of course quite time-consuming, doing so many requests. Fortunately however, the wikidata api returns all the available languages for each request. More importantly in my particular case I'm only interested in a very limited set of entity types.

My question is: Am I oversimplifying this and missing important details, which would make this more complicated than the idea sketched above?

dennlinger commented 1 year ago

Depending on what kinds of entities you are interested, it might even be sufficient to apply the en_label to other languages, especially if it is concerning person entities. E.g., for "Donald Duck", Wikidata has synonymous expressions only for English, whereas other languages oftentimes have a shorter synonym list available (if at all). I do like the idea of making it more applicable to other languages in general, though!

supersambo commented 1 year ago

Thanks for your feedback. I will try it out then.