bionomia / bionomia

Sinatra app to parse people names from biodiversity occurrence data, apply basic regular expressions and heuristics to disambiguate them, and to make these occurrence records as entities that can be claimed by people via ORCID.
https://bionomia.net
MIT License
14 stars 7 forks source link

suspicious header [Object,Predicate,Subject] for what appears to be [Subject,Predicate,Object] #268

Closed jhpoelen closed 1 year ago

jhpoelen commented 1 year ago

in working with bionomia and gbif to links type specimen to their associated humans (e.g., collectors), I used a versioned copy of a resource you share via https://bionomia.net/downloads as recorded in:

Bionomia Community, & Poelen, Jorrit. (2023). Bionomia: a versioned archive of associations between people and their associated biodiversity data records they worked on. hash://sha256/22afc7a3e4e1c3bc289ce39573463331d3b594a11512c1233e39436973aea974 hash://md5/5e3ad5beb5df6041d20fa89a1d2b49fe (0.1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7808962

In working with the data, I noticed that the first 5 records of https://bionomia.net/data/bionomia-public-claims.csv.gz , were:

Object Predicate Subject
https://gbif.org/occurrence/1839364365 http://rs.tdwg.org/dwc/iri/identifiedBy https://orcid.org/0000-0001-9008-0611
https://gbif.org/occurrence/657804907 http://rs.tdwg.org/dwc/iri/identifiedBy https://orcid.org/0000-0001-9008-0611
https://gbif.org/occurrence/657804727 http://rs.tdwg.org/dwc/iri/identifiedBy https://orcid.org/0000-0001-9008-0611
https://gbif.org/occurrence/657804529 http://rs.tdwg.org/dwc/iri/identifiedBy https://orcid.org/0000-0001-9008-0611

as obtained via

preston cat\
 --anchor hash://sha256/22afc7a3e4e1c3bc289ce39573463331d3b594a11512c1233e39436973aea974\
 --remote https://zenodo.org/record/7808962/files,https://linker.bio\
 https://bionomia.net/data/bionomia-public-claims.csv.gz\
 | gunzip\
 | head -n5

The first data row reads to be as:

https://gbif.org/occurrence/1839364365 http://rs.tdwg.org/dwc/iri/identifiedBy https://orcid.org/0000-0001-9008-0611

which can be parsed as: subject https://gbif.org/occurrence/1839364365 verb identifiedBy object https://orcid.org/0000-0001-9008-0611

However, the header labels https://gbif.org/occurrence/1839364365 as the object and https://orcid.org/0000-0001-9008-0611 as the subject.

Can someone help clear up my confusion in what is the subject/ object in the recorded claims above?

jhpoelen commented 1 year ago

@dshorthouse Thanks!

dshorthouse commented 1 year ago

Jorrit, if it weren't for you and all enthusiasts, Bionomia would be relegated to obscurity and inutility. Thanks are all mine.

dshorthouse commented 1 year ago

Header in the download updated on https://bionomia.net/downloads

jhpoelen commented 1 year ago

Using an updated version of:

Bionomia Community, & Poelen, Jorrit. (2023). Bionomia: a versioned archive of associations between people and their associated biodiversity data records they worked on. hash://sha256/4b192ed16cfe8577c2e275ada76bfdc19fe5a5381547139c8f8f4079e704b6f2 hash://md5/a69075f7a9a19f069c6d0c6d8f312259 (0.2) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7810635

I was able to confirm that the headers are now subject/predicate/object as expected:

Subject Predicate Object
https://gbif.org/occurrence/1839364365 http://rs.tdwg.org/dwc/iri/identifiedBy https://orcid.org/0000-0001-9008-0611
https://gbif.org/occurrence/657804907 http://rs.tdwg.org/dwc/iri/identifiedBy https://orcid.org/0000-0001-9008-0611
https://gbif.org/occurrence/657804727 http://rs.tdwg.org/dwc/iri/identifiedBy https://orcid.org/0000-0001-9008-0611
https://gbif.org/occurrence/657804529 http://rs.tdwg.org/dwc/iri/identifiedBy https://orcid.org/0000-0001-9008-0611

as produced via:

preston cat\
 --remote https://zenodo.org/record/7810635/files,https://linker.bio\
 --anchor hash://sha256/4b192ed16cfe8577c2e275ada76bfdc19fe5a5381547139c8f8f4079e704b6f2\
 https://bionomia.net/data/bionomia-public-claims.csv.gz\
 | gunzip\
 | head -n5