factsmission / synospecies

Using Plazi Data to find currently accepted scientific names
https://synospecies.plazi.org/
MIT License
5 stars 1 forks source link

diagnostic tool to find out contradicting authorities #179

Open myrmoteras opened 5 days ago

myrmoteras commented 5 days ago

@nleanba can you please add the sparql query to find out conflicting authorities into the canned queries in "advanced". See the one you did on Trex

nleanba commented 5 days ago

I can easily put the query for nanotyrannus into the advanced tab, but I'd prefer to make it a bit more generally useful first.

There is no query for all synonyms yet, for Tyrannosaurus I just manually run the query for all synonyms and removed all entries without conflicts by hand

nleanba commented 5 days ago

Here is a more general query:

################################################################################
#                                                                              #
# Note: This query ONLY works with the treatment.ld.plazi.org sparql endpoint! #
#                                                                              #
################################################################################
PREFIX dwc: <http://rs.tdwg.org/dwc/terms/>
PREFIX dwcFP: <http://filteredpush.org/ontologies/oa/dwcFP#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX treat: <http://plazi.org/vocab/treatment#>
SELECT DISTINCT ("--" AS ?simple) ?name ?authority (GROUP_CONCAT(?treatment; separator=",") AS ?treatments) WHERE {
  ?tc treat:hasTaxonName ?name .
  ?tc dwc:scientificNameAuthorship ?authority1 .

  GRAPH ?treatment {
    ?tc dwc:scientificNameAuthorship ?authority .
  }
  FILTER(?authority1 != ?authority)
}
GROUP BY ?name ?authority
ORDER BY ?name
LIMIT 100
nleanba commented 5 days ago

Running a similar query reveals there to be 26467 names with multiple authorities in the data, so manual fixup would be quite the effort

nleanba commented 5 days ago

For a given taxon name, the follwowing lists all treatments for it and their authority and some useful metadata to help in deciding which one is correct:

################################################################################
#                                                                              #
# Note: This query ONLY works with the treatment.ld.plazi.org sparql endpoint! #
#                                                                              #
################################################################################
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dwc: <http://rs.tdwg.org/dwc/terms/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX treat: <http://plazi.org/vocab/treatment#>
SELECT DISTINCT ?name ?authority ?year ?treatment ?authors ?title WHERE {

  # Replace Name here as relevant
  BIND(<http://taxon-name.plazi.org/id/Animalia/Laelaps_incrassatus> AS ?name)

  ?tc treat:hasTaxonName ?name .
  GRAPH ?treatment {
    ?tc dwc:scientificNameAuthorship ?authority .
  }

  BIND(IRI(REPLACE(STR(?treatment), "https", "http")) AS ?treatment_http)

  ?treatment_http dc:creator ?authors ;
             dc:title ?title ;
             treat:publishedIn/dc:date ?year .
}
ORDER BY ?year

For example, for Laelaps incrassatus, it gives image which to me indicates that the latter two treatments are probably wrong and should be fixed with

nleanba commented 5 days ago

A quick glance at the list provided by the first query above shows that most "disagreements" are (Name, 1234) vs Name, 1234 (i.e. Name as baseAuthority vs as authority).

These cannot be fixed easily "after-the-fact" and require a human to check if it is supposed to be base- or non-base-authority.

However, i have found a handful of cases that could be "fixed" as such:

In other cases, some variants are redundant shorter versions of others, so these could be hidden in Synospecies by hiding some names (and putting them into a small (i)-popup with a "Authority also given as:" notice):