WDscholia / scholia

Wikidata-based scholarly profiles
https://scholia.toolforge.org
Other
226 stars 81 forks source link

sort parent taxons in order, from highest to lowest #239

Open egonw opened 6 years ago

egonw commented 6 years ago

... but need to figure out first how to get this data from Wikidata...

rossmounce commented 6 years ago

for Homo sapiens (Q15978631) it gives 30 ranks in total.

The order of these should be displayed ideally either highest to lowest or lowest to highest (in rank). Thus the order should be (lowest to highest):

Homo    genus
Hominina    subtribe
Hominin tribe
Homininae   subfamily
Hominidae   family
Hominoidea  superfamily
Catarrhini  parvorder
Simiiformes infraorder
Haplorrhini suborder
Primates    order
Primatomorpha   mirorder
Euarchontoglires    superorder, grandorder
Holotheria  infraclass
Placentalia infraclass, cohorte
Eutheria    subclass
Theria  subclass, supercohort
Boreosphenida   infraclass
Cladotheria legion
Trechnotheria   superlegion
Theriiformes    subclass
mammal  class
Tetrapoda   superclass
Gnathostomata   infraphylum
Vertebrata  subphylum
Chordata    phylum
deuterostome    infrakingdom
Bilateria   subkingdom
animal  kingdom
Eukaryote   superkingdom
biota   superdomain

Clearly there are multiple terms of equivalent rank e.g. for infraclass. One needs to examine the phylogeny to determine exact order between them, but even roughly; domain, kingdom, phylum, class order would be nice even if the exact sort amongst same level rank names is not perfectly correct.

Source for infraclass and legion determination: https://en.wikipedia.org/wiki/Tribosphenida#phylogeny

fnielsen commented 6 years ago

Yes, this is a mess. I suppose one way to handle it would be to make an explicite translation table in SPARQL between the taxon and a numerical value

egonw commented 1 year ago

Today on the Wikidata Telegram a similar question came up and Andrew posted a query he had worked on for MPs. This led me to this query we can use to solve this issue:

# chains of direct male ancestors for an MP who were themselves MPs

SELECT DISTINCT ?taxon ?taxonLabel ?relative ?relativeLabel ?distance
WITH { 
  SELECT DISTINCT ?taxon ?taxonLabel ?relative ?relativeLabel (count(distinct ?rel) as ?distance) # find taxon, ancestor, count generations
  WHERE  { 
  values ?taxon { wd:Q133128 }

  values ?tx { wd:Q16521 }   # classes of MPs
  values ?tx2 { wd:Q16521 }  # classes of MPs
  values ?tx3 { wd:Q16521 }  # classes of MPs
  ?taxon wdt:P171* ?rel . ?rel wdt:P171+ ?relative.
  ?taxon wdt:P31 wd:Q16521 .
  ?rel wdt:P31 wd:Q16521 .
  ?relative wdt:P31 wd:Q16521 .
  } GROUP BY ?taxon ?taxonLabel ?relative ?relativeLabel 
} AS %MPS WHERE {
  INCLUDE %MPS
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} order by desc(?distance)

cc @Adafede

Adafede commented 1 year ago

@egonw I prefer wdt:P225 (taxon name) to labels, see:

SELECT DISTINCT ?taxon ?taxon_name ?relative ?relative_name (COUNT(DISTINCT ?rel) AS ?distance) WHERE {
  VALUES ?taxon {
    wd:Q133128
  }
  ?taxon (wdt:P171*) ?rel;
    wdt:P225 ?taxon_name.
  ?rel (wdt:P171+) ?relative.
  ?relative wdt:P225 ?relative_name.
}
GROUP BY ?taxon ?taxon_name ?relative ?relative_name
ORDER BY DESC (?distance)