Open juba opened 4 months ago
The following triple request allows to get wikipedia pages that either:
SELECT DISTINCT * WHERE {
{
SELECT DISTINCT ?article ?lang WHERE {
?taxid ps:P685 "63221".
?speciesId p:P685 ?taxid.
?article schema:about ?speciesId.
?article schema:isPartOf [ wikibase:wikiGroup "wikipedia" ] .
?article schema:inLanguage ?lang .
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]" }
}
}
UNION {
SELECT DISTINCT ?article ?lang WHERE {
?taxid ps:P685 "63221".
?speciesId p:P685 ?taxid.
?species ^wdt:P366 ?speciesId .
?article schema:about ?species.
?article schema:isPartOf [ wikibase:wikiGroup "wikipedia" ] .
?article schema:inLanguage ?lang .
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]" }
}
}
UNION {
SELECT DISTINCT ?article ?lang WHERE {
?item ?label "Homo neanderthalensis"@en.
?article schema:about ?item .
?article schema:isPartOf [ wikibase:wikiGroup "wikipedia" ] .
?article schema:inLanguage ?lang .
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]" }
}
}
}
This request can yield duplicates (several pages for one language), but by selecting only one of them it should be fine?
Note that when a species misses a taxid, it is possible to add it directly in wikidata by editing its page.
Problem in this case: making the request on sciname can yield articles which are not for the same species, as there can be homonymous species.
Should we keep searching on sciname ?
It seems better to not query on scinames in order to avoid non-relevant results. So the query should be the following (replace 63221 by the taxid of interest):
SELECT DISTINCT * WHERE {
{
SELECT DISTINCT ?article ?lang WHERE {
?taxid ps:P685 "63221".
?speciesId p:P685 ?taxid.
?article schema:about ?speciesId.
?article schema:isPartOf [ wikibase:wikiGroup "wikipedia" ] .
?article schema:inLanguage ?lang .
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]" }
}
}
UNION {
SELECT DISTINCT ?article ?lang WHERE {
?taxid ps:P685 "63221".
?speciesId p:P685 ?taxid.
?species ^wdt:P366 ?speciesId .
?article schema:about ?species.
?article schema:isPartOf [ wikibase:wikiGroup "wikipedia" ] .
?article schema:inLanguage ?lang .
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]" }
}
}
}
For new frontend, rework wikidata queries to get metadata + list of available wikipedia pages in different languages starting from ncbi taxid.
For the moment the reference query is:
NCBI ids to test : 9615 for canis lupus familiaris and Neanderthal
https://query.wikidata.org