Lifemap-ToL / lifemap-back

Lifemap infrastructure and builder.
GNU General Public License v3.0
0 stars 0 forks source link

Wikidata query for wikipedia articles #6

Open juba opened 2 months ago

juba commented 2 months ago

For new frontend, rework wikidata queries to get metadata + list of available wikipedia pages in different languages starting from ncbi taxid.

For the moment the reference query is:

SELECT * 
WHERE { ?item p:P685 ?statement0.
       OPTIONAL{?item wdt:P627 ?iucn.} 
       OPTIONAL{?item wdt:P846 ?gbif.} 
       OPTIONAL{?item wdt:P3151 ?inaturalist.} 
       OPTIONAL{?item wdt:P9157 ?openTreeOfLife.} 
       OPTIONAL{?item wdt:P10585 ?catalogueOfLife.} 
       OPTIONAL{?item wdt:P141 ?iucnStatus.} 
       ?item p:P685 ?ncbi.
       ?statement0 (ps:P685) "9615". 
       ?article schema:about ?item . 
       ?article schema:isPartOf "https://en.wikipedia.org/" . 
       SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } }

NCBI ids to test : 9615 for canis lupus familiaris and Neanderthal

https://query.wikidata.org

juba commented 1 month ago

The following triple request allows to get wikipedia pages that either:

SELECT DISTINCT * WHERE {
  {
    SELECT DISTINCT ?article ?lang WHERE {
      ?taxid ps:P685 "63221". 
      ?speciesId p:P685 ?taxid.
      ?article schema:about ?speciesId.
      ?article schema:isPartOf [ wikibase:wikiGroup "wikipedia" ] . 
      ?article schema:inLanguage ?lang .
      SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]" }
    }
  }
  UNION {
    SELECT DISTINCT ?article ?lang WHERE {
      ?taxid ps:P685 "63221". 
      ?speciesId p:P685 ?taxid.
      ?species ^wdt:P366 ?speciesId .
      ?article schema:about ?species.
      ?article schema:isPartOf [ wikibase:wikiGroup "wikipedia" ] . 
      ?article schema:inLanguage ?lang .
      SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]" }
    }
  }
  UNION {
    SELECT DISTINCT ?article ?lang WHERE {
      ?item ?label "Homo neanderthalensis"@en.  
      ?article schema:about ?item .
      ?article schema:isPartOf [ wikibase:wikiGroup "wikipedia" ] . 
      ?article schema:inLanguage ?lang .
      SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]" }
    }
  }
}

This request can yield duplicates (several pages for one language), but by selecting only one of them it should be fine?

Note that when a species misses a taxid, it is possible to add it directly in wikidata by editing its page.

Example for Homo Neanderthalensis

juba commented 1 month ago

Problem in this case: making the request on sciname can yield articles which are not for the same species, as there can be homonymous species.

Example for Stigmatella

Should we keep searching on sciname ?

juba commented 1 month ago

It seems better to not query on scinames in order to avoid non-relevant results. So the query should be the following (replace 63221 by the taxid of interest):

SELECT DISTINCT * WHERE {
  {
    SELECT DISTINCT ?article ?lang WHERE {
      ?taxid ps:P685 "63221". 
      ?speciesId p:P685 ?taxid.
      ?article schema:about ?speciesId.
      ?article schema:isPartOf [ wikibase:wikiGroup "wikipedia" ] . 
      ?article schema:inLanguage ?lang .
      SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]" }
    }
  }
  UNION {
    SELECT DISTINCT ?article ?lang WHERE {
      ?taxid ps:P685 "63221". 
      ?speciesId p:P685 ?taxid.
      ?species ^wdt:P366 ?speciesId .
      ?article schema:about ?species.
      ?article schema:isPartOf [ wikibase:wikiGroup "wikipedia" ] . 
      ?article schema:inLanguage ?lang .
      SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]" }
    }
  }
}

Example for Neanderthal