Ironholds / WikidataR

An R package for the Wikidata API
Other
53 stars 14 forks source link

wikidata title and wikipedia title #9

Closed Ruthygg closed 8 years ago

Ruthygg commented 9 years ago

In the majority of the cases, the title of a wikidata item is the same as the corresponding wikipedia article in english. However, sometimes this is not the case. This can cause problems if you have a list of wikipedia pages to extract semantic data from wikidata. I was wondering if there is a way to extract the corresponding wikidata page given a wikipedia page.

Thanks in advance for any suggestion. Ruth

Ironholds commented 9 years ago

That's a really good question I don't know the answer to off the top of my head! I'm going to do some research tomorrow and dig into our API - I suspect the answer might be that it's stored as page metadata in some way and can be extracted like that (although I would not be shocked to discover that, in actual fact, it is not extractable and that is a one-way process. Our API is hell.)

Ruthygg commented 9 years ago

Thanks for answering Olivier. Indeed this is a limitation of wikidata. I hope there is a way to access the metada in some way. Yesterday I had around 1000 items to crawl and say 20 had an error because the title did not match. I manually changed those names to match the wikipedia page lol.

Will be following any update. Thanks again Ruth

Ironholds commented 8 years ago

Okay, answer, long time out (sorry!) is: yeah, they're sometimes different. You can sort of solve for it by looking at the sitelinks in the wikidata entries, but it's not perfect :(