beehind / beehind.github.io

Beehind: pilot workflows to capture prominent bee specimen and their historic and ecological associates
https://beehind.org
Creative Commons Zero v1.0 Universal
0 stars 0 forks source link

method to link CASTYPE1652 to Wikidata via GBIF and Bionomia #7

Open jhpoelen opened 1 year ago

jhpoelen commented 1 year ago

with #5 and publications of

Poelen, Jorrit. (2023). Global Biodiversity Informatics Facility (GBIF): an exhaustive list of gbif record ids, dataset keys, and their associated Occurrence IDs, Institution Code, Collection Codes and Catalog Numbers. hash://sha256/ea88f03a7bfd1ba853fdbea3203d54ab81ac3cdc8e8da7c96bbbba9c4b05d933 hash://md5/c49fe34785354847b37ea4509261e130 (0.1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7789866

see https://github.com/beehind/beehind.github.io/issues/5#issuecomment-1492543646

we can now find a gbifID for CASTYPE1652 in a defined verifiable resource using a regex, or other (more elegant) search methods on a well-defined digital resource of known provenance/origin.

With this, we can lookup known associations to people via bionomia -

In this case, we have:

curl 'https://linker.bio/line:gz:hash://sha256/cca558f470657a3c3fb99be70907d5705e7b5c20d12412073307fc9146e94394!/L1,L10359801'

yielding

Object,Predicate,Subject
https://gbif.org/occurrence/2238760764,http://rs.tdwg.org/dwc/iri/recordedBy,http://www.wikidata.org/entity/Q23813352

or

Object    https://gbif.org/occurrence/2238760764
Predicate http://rs.tdwg.org/dwc/iri/recordedBy
Subject   http://www.wikidata.org/entity/Q23813352

or line 10359801 in gunzipped version of hash://sha256/cca558f470657a3c3fb99be70907d5705e7b5c20d12412073307fc9146e94394 , which happens to be a bionomia snapshot of https://bionomia.net/data/bionomia-public-claims.csv.gz . Header was added for semantics.

$ preston alias hash://sha256/cca558f470657a3c3fb99be70907d5705e7b5c20d12412073307fc9146e94394
<https://bionomia.net/data/bionomia-public-claims.csv.gz> <http://purl.org/pav/hasVersion> <hash://sha256/cca558f470657a3c3fb99be70907d5705e7b5c20d12412073307fc9146e94394> <urn:uuid:fa6c6542-57bc-405a-b518-01c225f474b1> .

see screenshot:

image

et voila - we have a path to wikidata via gbif and bionomia facilitated by preston:

 CASTYPE1652 
  -[:indexedAs] -> 
      https://gbif.org/occurrence/2238760764 
          -[:recordedBy] -> 
              http://www.wikidata.org/entity/Q23813352 

fyi @dshorthouse @Daniel-Mietchen

jhpoelen commented 1 year ago

Also see related thread at https://discourse.gbif.org/t/type-specimen-castype1652-found-via-filtered-query-https-doi-org-10-15468-dl-xf6ahb-but-not-in-open-access-gbif-data-product-https-doi-org-10-15468-dl-pk3trq/3884 .

jhpoelen commented 1 year ago

See also https://github.com/beehind/beehind.github.io/commit/810cc5d126294eb835b7376a589a0a6aad2017a9

jhpoelen commented 1 year ago

For methods to access published wikidata archives, see https://github.com/Hydriz/Balchivist https://meta.wikimedia.org/wiki/Grants:Project/Hydriz/Balchivist_2.0 and https://wikitech.wikimedia.org/wiki/Nova_Resource:Dumps#2022-02-12 .