Ironholds / WikidataR

An R package for the Wikidata API
Other
53 stars 14 forks source link

Retrieve Wikidata info based on Wikipedia link #34

Open bshor opened 4 years ago

bshor commented 4 years ago

Hello, I would like to access a Wikidata item based on a Wikipedia link.

For example, say I'm interested in looking up the entry for this person:

https://en.wikipedia.org/wiki/Sarah_Davis_(Texas_politician)

Her Wikidata page is here:

https://www.wikidata.org/wiki/Q16215746

Note that in the Wikipedia section of the Web page, there's a link back to the Wikipedia entry.

What I would like to do is, given an arbitrary Wikipedia page link, look up the item ID (and other entries) from Wikipedia using this R package. Is this possible?

TS404 commented 3 years ago

There should be a way to do that. I might write a function for it, but in the meantime, the code would be:

install.packages("httr")
install.packages("XML2")
page.url  <- "https://en.wikipedia.org/wiki/Sarah_Davis_(Texas_politician)"
page.name <- gsub(".*wiki/","",page.url)
api.url   <- paste0("https://en.wikipedia.org/w/api.php?action=query&format=xml&redirects=1&prop=pageprops&titles=",
                    page.name)
item.qid  <- attr(xml2::as_list(xml2::read_xml(httr::GET(api.url)$content))$api$query$pages$page$pageprops,"wikibase_item")
item.data <- get_item(item.qid)
bshor commented 3 years ago

This works! Thanks for responding.

And how to generalize this? For example, say I know the Ballotpedia ID for Roland Gutierrez just as I did the Wikipedia entry for Sarah Davis.

I want to find the Wikidata item for this person based on the Ballotpedia ID (which happens to be Roland_Gutierrez). How would I adapt your code?

For lookup purposes, his Wikidata page is here:

https://www.wikidata.org/wiki/Q16729525

As background, I have unique identifiers in my data, and I want to use Wikidata to get other information on this person, matching on arbitrary keys like Wikipedia entry, various IDs, etc.

salgo60 commented 3 years ago

you also have https://hub.toolforge.org/ doing things like that and the code is in https://github.com/maxlath/hub

bshor commented 3 years ago

Wow ... this is awesome! But, of course, it doesn't use the package at all. I guess that's ok, but I'd like to know how to do it within the confines of the package.

TS404 commented 3 years ago

Ironically, it's actually easier to get wikidata IDs from IDs other than the wikipedia page! It's a function I've added to the extended package here: TS404/WikidataR/.../Queries.Rqid_from_identifier. It should also work even if you don't know the property's PID!

To use:

library(devtools)
devtools::install_github("TS404/WikidataR")
qid_from_identifier('Ballotpedia ID','Roland_Gutierrez')