Open jhpoelen opened 1 year ago
I'd like to find a way to structure a nanopublication to say the following:
In a version of content associated with https://doi.org/10.3897/BDJ.11.e107914, the author makes the claim that:
"For each database, a new entry was created in ChecklistBank. A data release in the Catalogue of Life Data Package (CoLDP) format (Döring and Ower 2019) was created and uploaded to Zenodo where it received a DOI. The same data are then uploaded to ChecklistBank."
To verify this claim for the IPNI database (table 2. second row), I found that:
A data release for IPNI was associated with https://doi.org/10.5281/zenodo.7208699, checklist bank id 164203, and github repository https://github.com/rdmpage/ipni-coldp .
In order to verify the author claim, I made the following (machine readable) assumptions:
<https://doi.org/10.5281/zenodo.7208699> <http://www.w3.org/ns/prov#hadMember> <https://zenodo.org/record/7974720> .
<https://zenodo.org/record/7974720> <http://www.w3.org/ns/prov#hadMember> <https://zenodo.org/record/7974720/files/rdmpage/ipni-coldp-2023-05-26.zip> .
<https://checklistbank.org/dataset/164203> <http://www.w3.org/ns/prov#hadMember> <https://api.checklistbank.org/dataset/164203/archive.zip> .
Following, the content was retrieved and associated content ids were calculated as describe in the provenance graph with content id hash://md5/d77aab98b6c323d285350d5401b0230b (see attached:
d77aab98b6c323d285350d5401b0230b.nq.txt )
with the following version statements generated via
preston ls\
--algo md5\
--anchor hash://md5/d77aab98b6c323d285350d5401b0230b\
--remote https://linker.bio\
| grep hasVersion
<https://zenodo.org/record/7974720/files/rdmpage/ipni-coldp-2023-05-26.zip> <http://purl.org/pav/hasVersion> <hash://md5/759e54ff69803f9759ca4464b7a5d4bd> <urn:uuid:2b7730e8-b742-4e2f-bd4b-99cf96e4bf0b> .
<https://api.checklistbank.org/dataset/164203/archive.zip> <http://purl.org/pav/hasVersion> <hash://md5/57a32540fa4db2fefdf461bc6b353e68> <urn:uuid:4be23b8c-c94c-451e-b641-d43adb6773e9> .
Showing that content associated with https://zenodo.org/record/7974720/files/rdmpage/ipni-coldp-2023-05-26.zip and https://api.checklistbank.org/dataset/164203/archive.zip as retrieved on 2023-09-27 are different, because their associated content identifiers (i.e., hash://md5/759e54ff69803f9759ca4464b7a5d4bd , hash://md5/57a32540fa4db2fefdf461bc6b353e68 ) are not the same.
Note, however, that some of the entries in the respective zip archives did appear to contain the same content. However, the archives were not the same, as stated.
Suggest to alter text:
"[...] The same data are then uploaded to ChecklistBank. [...]"
to something along the lines of:
"[...] A copy of this data was then slightly altered and uploaded to ChecklistBank. [...]"
Thank you for considering my observations and suggestion.
@tkuhn - how would you recommend to go about making a claim such as above in the form of a nanopublication?
Interesting case!
I think the first question is: how can this be represented in RDF? (or some other type of logic) That's independent of nanopublications.
It seems to me that quite some modeling effort is required for this. There might be existing vocabularies, but I am not aware of any that would cover this.
We'd need statements like these:
:va a x:VerificationAttempt .
:va x:target :claim .
:claim x:expressedIn :text .
:text oa:exact "The same data are then uploaded to ChecklistBank." .
:text x:partOf <https://doi.org/10.3897/BDJ.11.e107914> .
:claim x:hasInterpretation :i .
:i x:by <https://github.com/jhpoelen> .
:i {
<https://doi.org/10.5281/zenodo.7208699> <http://www.w3.org/ns/prov#hadMember> <https://zenodo.org/record/7974720> .
<https://zenodo.org/record/7974720> <http://www.w3.org/ns/prov#hadMember> <https://zenodo.org/record/7974720/files/rdmpage/ipni-coldp-2023-05-26.zip> .
<https://checklistbank.org/dataset/164203> <http://www.w3.org/ns/prov#hadMember> <https://api.checklistbank.org/dataset/164203/archive.zip> .
}
...
This is just some unfinishd sketch from the top of my mind on how this could be modeled. But once we have such a more solid representation in RDF, it's then relatively straightforward to package these in nanopublications.
I hope that answer makes sense?
@tkuhn thanks for your rdf draft of a comment that refutes / questions a claim. I imagine that refuting a claim is a common annotation. Please do share any related examples as I am scraping up the courage to speak clearly articulated, well-formed rdf ; )
Now, I wonder - who'd be interested in this kind of nanopublication. Are there any plans to feed these kind of structure statements into a "letter to the editor" or "request revision" workflow?
Nevertheless, I do feel the urge to present my example in some kind of structured form.
I noticed that you collaborator Michel Dumontier wrote on the "Semanticscience Integrated Ontology (SIO)" .
Do you @tkuhn , or @micheldumontier, think SIO would be a good pick to help describe the refutation of a statement made in a published paper (like the one presented above)?
Dumontier, M., Baker, C.J., Baran, J. et al. The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery. J Biomed Semant 5, 14 (2014). https://doi.org/10.1186/2041-1480-5-14
I am not aware of anything in SIO that would be directly covering this. There seems to be 'is refuted by' as a type of (document-to-document) reference. Similarly, you have 'refutes' in CiTO. That gives you a way to point to something that you'd like to refute, but doesn't really cover the modeling of the content and reason of the refutation.
there is some support for argumentation .e.g references / cites / is supported by / is disputed by / is refuted by http://semanticscience.org/resource/SIO_000772
SIO might represent this as:
:statement a sio:Statement. :statement sio:hasValue "The same data are then uploaded to ChecklistBank." . :statement sio:isPartOf https://doi.org/10.3897/BDJ.11.e107914 . :statement sio:isDisputedBy :claim . :claim sio:hasCreator https://github.com/jhpoelen . :claim sio:hasValue "The data at BioDiversity Journal and ChecklistBank are not the same as evidenced by a different md5 hash of the respective files" . :claim sio:hasEvidence :evidence ; :evidence { https://zenodo.org/record/7974720/files/rdmpage/ipni-coldp-2023-05-26.zip sio:hasAttribute hash://md5/759e54ff69803f9759ca4464b7a5d4bd . https://api.checklistbank.org/dataset/164203/archive.zip sio:hasAttribute <hash://md5/57a32540fa4db2fefdf461bc6b353e68 > }
i was a bit lazy on the evidence part. honestly, we would instead use some kind of object (e.g. :Comparison/) with two inputs and one output ("not equal") i'd be happy to further develop if needed.
m.
For sake of exercise, I was hoping to figure out a way to dispute a claim in a recent publication:
Page R (2023) Ten years and a million links: building a global taxonomic library connecting persistent identifiers for names, publications and people. Biodiversity Data Journal 11: e107914. https://doi.org/10.3897/BDJ.11.e107914
with the claim being (emphasis mine)
For additional context, see attached screenshot.
see also https://github.com/bio-guoda/preston/issues/259