bio-guoda / preston

a biodiversity dataset tracker
MIT License
26 stars 1 forks source link

finding a more recent version hash for a known version #306

Open jhpoelen opened 1 month ago

jhpoelen commented 1 month ago

last week, during a meeting with @seltmann , @zedomel and @kristi-sara , @seltmann asked how to discover a more recent version for a preston archive with a known provenance hash.

turns out that preston supports this via the preston head --anchor [...] command.

For instance, if you'd like to discover a more recent copy of GIB (GBIF iDigBio BioCase) dataset corpus as mentioned on https://linker.bio#use-case-3-studying-pine-pests-caused-by-weevils-curculionoidea (see attached screenshot) with anchor hash://sha256/37bdd8ddb12df4ee02978ca59b695afd651f94398c0fe2e1f8b182849a876bb2

image

preston head\
  --anchor hash://sha256/37bdd8ddb12df4ee02978ca59b695afd651f94398c0fe2e1f8b182849a876bb2\
  --remote https://linker.bio

producing, at time of writing 2024-10-01,

hash://sha256/9af4013118def8ed1a3fca62e17c6992d62dc6d305f5e797e7b9999bb0352abe

with

preston cat hash://sha256/9af4013118def8ed1a3fca62e17c6992d62dc6d305f5e797e7b9999bb0352abe\
 | grep -E "2024-[0-9]{2}-[0-9]{2}"\
 | head -1

is

<urn:uuid:25f383f7-3589-4a1e-96cb-37419f77f9a3> <http://www.w3.org/ns/prov#startedAtTime> "2024-09-01T21:43:32.002Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> <urn:uuid:25f383f7-3589-4a1e-96cb-37419f77f9a3> .

suggesting that this version was created on 2024-09-01 (1 Sept 2024).

Note, however, that this result uses a non-verifiable traversal into the future, because from the perspective of the computer program, the future does not yet exist until questions are asked to the provenance graph locally, or retrieved remotely via https://linker.bio .

See https://github.com/bio-guoda/preston/blob/main/docs/architecture.md#simplified-hexastore to see how Preston implement this traveling into the future feature.