Closed dhimmel closed 3 years ago
Thanks for finding this! We'll look into a way to increase our checks and limit duplicate labels.
Notes for self: CL duplications coming from CL.
I have manually edited all of the terms that I could. It seems many o the terms in this list have an rdfs:label and a 'preferred label' which is coming up as a duplicate from your query. I wasn't able to replicate the results of the query, however, but have run a ROBOT report and our QC tests which did not highlight any further duplications.
Thanks @zoependlington for the work in https://github.com/EBISPOT/efo/commit/dedd1a0146fd0eff8f099b3195d2b51d2e4485d6 and https://github.com/obophenotype/cell-ontology/issues/841.
It seems many o the terms in this list have an rdfs:label and a 'preferred label' which is coming up as a duplicate from your query.
Interesting. The query matches rdfs:label
. So does that predicate also match "preferred label" triples? How do you tell with SPARQL whether a label is a rdfs:label
or "preferred label"?
A small number of EFO terms have multiple
rdfs:label
values:Query
Query output on EFO v3.22.0
Problem?
This makes it so SPARQL queries that want to show a label for each efo term are prone to returning duplicate rows per term. Although perhaps users should always account for this possibility? From https://www.w3.org/2004/12/q/doc/rdf-labels.html:
As seen in the output above, some duplicates go away once considering label plus language. But for http://www.ebi.ac.uk/efo/EFO_1001870 and some others, there are two labels both without a language specified.