bio-guoda / preston

a biodiversity dataset tracker
MIT License
24 stars 1 forks source link

formulate nanopublication that disputes a claim in existing publication (e.g. Page 2023) #260

Open jhpoelen opened 1 year ago

jhpoelen commented 1 year ago

For sake of exercise, I was hoping to figure out a way to dispute a claim in a recent publication:

Page R (2023) Ten years and a million links: building a global taxonomic library connecting persistent identifiers for names, publications and people. Biodiversity Data Journal 11: e107914. https://doi.org/10.3897/BDJ.11.e107914

with the claim being (emphasis mine)

[...] For each database, a new entry was created in ChecklistBank. A data release in the Catalogue of Life Data Package (CoLDP) format (Döring and Ower 2019) was created and uploaded to Zenodo where it received a DOI. The same data are then uploaded to ChecklistBank. [...]

For additional context, see attached screenshot.

image

see also https://github.com/bio-guoda/preston/issues/259

jhpoelen commented 1 year ago

I'd like to find a way to structure a nanopublication to say the following:

In a version of content associated with https://doi.org/10.3897/BDJ.11.e107914, the author makes the claim that:

"For each database, a new entry was created in ChecklistBank. A data release in the Catalogue of Life Data Package (CoLDP) format (Döring and Ower 2019) was created and uploaded to Zenodo where it received a DOI. The same data are then uploaded to ChecklistBank."

To verify this claim for the IPNI database (table 2. second row), I found that:

A data release for IPNI was associated with https://doi.org/10.5281/zenodo.7208699, checklist bank id 164203, and github repository https://github.com/rdmpage/ipni-coldp .

In order to verify the author claim, I made the following (machine readable) assumptions:

<https://doi.org/10.5281/zenodo.7208699> <http://www.w3.org/ns/prov#hadMember> <https://zenodo.org/record/7974720> .

<https://zenodo.org/record/7974720> <http://www.w3.org/ns/prov#hadMember> <https://zenodo.org/record/7974720/files/rdmpage/ipni-coldp-2023-05-26.zip> .

<https://checklistbank.org/dataset/164203> <http://www.w3.org/ns/prov#hadMember> <https://api.checklistbank.org/dataset/164203/archive.zip> .

Following, the content was retrieved and associated content ids were calculated as describe in the provenance graph with content id hash://md5/d77aab98b6c323d285350d5401b0230b (see attached:
d77aab98b6c323d285350d5401b0230b.nq.txt )

with the following version statements generated via

preston ls\
 --algo md5\
  --anchor hash://md5/d77aab98b6c323d285350d5401b0230b\
 --remote https://linker.bio\
| grep hasVersion
<https://zenodo.org/record/7974720/files/rdmpage/ipni-coldp-2023-05-26.zip> <http://purl.org/pav/hasVersion> <hash://md5/759e54ff69803f9759ca4464b7a5d4bd> <urn:uuid:2b7730e8-b742-4e2f-bd4b-99cf96e4bf0b> .
<https://api.checklistbank.org/dataset/164203/archive.zip> <http://purl.org/pav/hasVersion> <hash://md5/57a32540fa4db2fefdf461bc6b353e68> <urn:uuid:4be23b8c-c94c-451e-b641-d43adb6773e9> .

Showing that content associated with https://zenodo.org/record/7974720/files/rdmpage/ipni-coldp-2023-05-26.zip and https://api.checklistbank.org/dataset/164203/archive.zip as retrieved on 2023-09-27 are different, because their associated content identifiers (i.e., hash://md5/759e54ff69803f9759ca4464b7a5d4bd , hash://md5/57a32540fa4db2fefdf461bc6b353e68 ) are not the same.

Note, however, that some of the entries in the respective zip archives did appear to contain the same content. However, the archives were not the same, as stated.

Suggest to alter text:

"[...] The same data are then uploaded to ChecklistBank. [...]"

to something along the lines of:

"[...] A copy of this data was then slightly altered and uploaded to ChecklistBank. [...]"

Thank you for considering my observations and suggestion.

jhpoelen commented 1 year ago

@tkuhn - how would you recommend to go about making a claim such as above in the form of a nanopublication?

tkuhn commented 1 year ago

Interesting case!

I think the first question is: how can this be represented in RDF? (or some other type of logic) That's independent of nanopublications.

It seems to me that quite some modeling effort is required for this. There might be existing vocabularies, but I am not aware of any that would cover this.

We'd need statements like these:

:va a x:VerificationAttempt .
:va x:target :claim .
:claim x:expressedIn :text .
:text oa:exact "The same data are then uploaded to ChecklistBank." .
:text x:partOf <https://doi.org/10.3897/BDJ.11.e107914> .
:claim x:hasInterpretation :i .
:i x:by <https://github.com/jhpoelen> .
:i {
   <https://doi.org/10.5281/zenodo.7208699> <http://www.w3.org/ns/prov#hadMember> <https://zenodo.org/record/7974720> .
   <https://zenodo.org/record/7974720> <http://www.w3.org/ns/prov#hadMember> <https://zenodo.org/record/7974720/files/rdmpage/ipni-coldp-2023-05-26.zip> .
   <https://checklistbank.org/dataset/164203> <http://www.w3.org/ns/prov#hadMember> <https://api.checklistbank.org/dataset/164203/archive.zip> .
}
...

This is just some unfinishd sketch from the top of my mind on how this could be modeled. But once we have such a more solid representation in RDF, it's then relatively straightforward to package these in nanopublications.

I hope that answer makes sense?

jhpoelen commented 1 year ago

@tkuhn thanks for your rdf draft of a comment that refutes / questions a claim. I imagine that refuting a claim is a common annotation. Please do share any related examples as I am scraping up the courage to speak clearly articulated, well-formed rdf ; )

jhpoelen commented 1 year ago

Now, I wonder - who'd be interested in this kind of nanopublication. Are there any plans to feed these kind of structure statements into a "letter to the editor" or "request revision" workflow?

jhpoelen commented 1 year ago

Nevertheless, I do feel the urge to present my example in some kind of structured form.

I noticed that you collaborator Michel Dumontier wrote on the "Semanticscience Integrated Ontology (SIO)" .

Do you @tkuhn , or @micheldumontier, think SIO would be a good pick to help describe the refutation of a statement made in a published paper (like the one presented above)?

Dumontier, M., Baker, C.J., Baran, J. et al. The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery. J Biomed Semant 5, 14 (2014). https://doi.org/10.1186/2041-1480-5-14

tkuhn commented 1 year ago

I am not aware of anything in SIO that would be directly covering this. There seems to be 'is refuted by' as a type of (document-to-document) reference. Similarly, you have 'refutes' in CiTO. That gives you a way to point to something that you'd like to refute, but doesn't really cover the modeling of the content and reason of the refutation.

micheldumontier commented 1 year ago

there is some support for argumentation .e.g references / cites / is supported by / is disputed by / is refuted by http://semanticscience.org/resource/SIO_000772

SIO might represent this as:

:statement a sio:Statement. :statement sio:hasValue "The same data are then uploaded to ChecklistBank." . :statement sio:isPartOf https://doi.org/10.3897/BDJ.11.e107914 . :statement sio:isDisputedBy :claim . :claim sio:hasCreator https://github.com/jhpoelen . :claim sio:hasValue "The data at BioDiversity Journal and ChecklistBank are not the same as evidenced by a different md5 hash of the respective files" . :claim sio:hasEvidence :evidence ; :evidence { https://zenodo.org/record/7974720/files/rdmpage/ipni-coldp-2023-05-26.zip sio:hasAttribute hash://md5/759e54ff69803f9759ca4464b7a5d4bd . https://api.checklistbank.org/dataset/164203/archive.zip sio:hasAttribute <hash://md5/57a32540fa4db2fefdf461bc6b353e68 > }

i was a bit lazy on the evidence part. honestly, we would instead use some kind of object (e.g. :Comparison/) with two inputs and one output ("not equal") i'd be happy to further develop if needed.

m.