Open jhpoelen opened 3 years ago
On close inspection, the content produced by configured URL changes around April 2021 for related resource :
Pocock, Michael J. O.; Evans, Darren M.; Memmott, Jane (2012), Data from: The robustness and restoration of a network of ecological networks, Dryad, Dataset, https://doi.org/10.5061/dryad.3s36r118
See attached pocock2012data.zip for content referenced by their hashes in the table below.
date/time | content hash (sha256) | data url |
---|---|---|
2021-04-03T02:09:47.214Z | 49be776718febf8e73bd5f456244e215978d1e0860dbc0c0f3d484d3cc81f709 | https://datadryad.org/stash/downloads/file_stream/40842 |
2021-04-10T01:15:38.495Z | 49be776718febf8e73bd5f456244e215978d1e0860dbc0c0f3d484d3cc81f709 | https://datadryad.org/stash/downloads/file_stream/40842 |
2021-04-17T02:05:06.846Z | 8784b5d81674fc30ff7a1774416476075ceb9d0b051add58612b7215a4e52296 | https://datadryad.org/stash/downloads/file_stream/40842 |
2021-04-24T01:43:03.361Z | 8784b5d81674fc30ff7a1774416476075ceb9d0b051add58612b7215a4e52296 | https://datadryad.org/stash/downloads/file_stream/40842 |
... | ... | ... |
2021-08-28T00:48:27.715Z | 8784b5d81674fc30ff7a1774416476075ceb9d0b051add58612b7215a4e52296 | https://datadryad.org/stash/downloads/file_stream/40842 |
where originally, at least in 10 April 2021, https://datadryad.org/stash/downloads/file_stream/40842 produced content with first 10 lines being -
$ cat 49be776718febf8e73bd5f456244e215978d1e0860dbc0c0f3d484d3cc81f709 | head
lower guild,upper guild,lower taxon,upper taxon,estimated interaction strength,direct interaction
plant,flower visitor,Ajuga reptans,Rhingia campestris,1.492E+02,1
plant,flower visitor,Alliaria petiolata,Simulium sp,4.971E+01,1
plant,flower visitor,Anthriscus sylvestris,Agriotes pallidulus,1.969E+01,1
plant,flower visitor,Anthriscus sylvestris,Anaspis maculata,1.969E+01,1
plant,flower visitor,Anthriscus sylvestris,Anthocomus fasciatus,3.939E+01,1
plant,flower visitor,Anthriscus sylvestris,Bicellaria vana,1.969E+01,1
plant,flower visitor,Anthriscus sylvestris,Botanophila striolata,1.969E+01,1
plant,flower visitor,Anthriscus sylvestris,Bracon sp,3.939E+01,1
plant,flower visitor,Anthriscus sylvestris,Cantharis lateralis,3.939E+01,1
...
whereas later on 17 April 2021, https://datadryad.org/stash/downloads/file_stream/40842 produced -
$ cat 8784b5d81674fc30ff7a1774416476075ceb9d0b051add58612b7215a4e52296 | head
### Simulations used in Chevin, Visser, Tufto (Evolution) to test the statistical method.
## 3 sets of simulations below: (1) fluctuating optimum; (2) exponential fitness function with fluctuating rate; (3) constant optimum
#install.packages("INLA",repos="http://cran.r-project.org/")
library(INLA)
## to have more info on what the model is parameterized etc, do: inla.doc("ar1")
setwd(getwd());
################ Simulations with fluctuating optimum #############################
#parameters
related data download Dryad access page access via https://doi.org/10.5061/dryad.3s36r118 and resolved to https://datadryad.org/stash/dataset/doi:10.5061/dryad.3s36r118 -
where norwood.csv
now is associated to https://datadryad.org/stash/downloads/file_stream/40321 , not https://datadryad.org/stash/downloads/file_stream/40842 .
and . . .
$ curl --silent -L "https://datadryad.org/stash/downloads/file_stream/40321" | sha256sum
49be776718febf8e73bd5f456244e215978d1e0860dbc0c0f3d484d3cc81f709 -
with content, including first 10 lines, being exactly the same as the old, pre-17 April 2021 version of the Pocock et al. 2021 data.
$ curl --silent -L "https://datadryad.org/stash/downloads/file_stream/40321" | head
lower guild,upper guild,lower taxon,upper taxon,estimated interaction strength,direct interaction
plant,flower visitor,Ajuga reptans,Rhingia campestris,1.492E+02,1
plant,flower visitor,Alliaria petiolata,Simulium sp,4.971E+01,1
plant,flower visitor,Anthriscus sylvestris,Agriotes pallidulus,1.969E+01,1
plant,flower visitor,Anthriscus sylvestris,Anaspis maculata,1.969E+01,1
plant,flower visitor,Anthriscus sylvestris,Anthocomus fasciatus,3.939E+01,1
plant,flower visitor,Anthriscus sylvestris,Bicellaria vana,1.969E+01,1
plant,flower visitor,Anthriscus sylvestris,Botanophila striolata,1.969E+01,1
plant,flower visitor,Anthriscus sylvestris,Bracon sp,3.939E+01,1
plant,flower visitor,Anthriscus sylvestris,Cantharis lateralis,3.939E+01,1
This is an example of content drift - the original url no longer produces the original content, but is replaced with some other data. In this case the other data appears to be an R script from a completely different data dryad data publication.
After updating the data url to https://datadryad.org/stash/downloads/file_stream/40321 via https://github.com/globalbioticinteractions/pocock2012/commit/be57f1d26a208fb4b2adfad72e17a3b88abe5bf9, the elton review looked more favorable, indicating that interaction data was successfully indexed:
_____ _ ____ _____ _____ _
/ ____| | | _ \_ _| | __ \ (_)
| | __| | ___ | |_) || | | |__) |_____ ___ _____ __
| | |_ | |/ _ \| _ < | | | _ // _ \ \ / / |/ _ \ \ /\ / /
| |__| | | (_) | |_) || |_ | | \ \ __/\ V /| | __/\ V V /
\_____|_|\___/|____/_____| |_| \_\___| \_/ |_|\___| \_/\_/
| | | ____| | |
| |__ _ _ | |__ | | |_ ___ _ __
| '_ \| | | | | __| | | __/ _ \| '_ \
| |_) | |_| | | |____| | || (_) | | | |
|_.__/ \__, | |______|_|\__\___/|_| |_|
__/ |
|___/
Miller 3.4.0
s3cmd version 2.1.0
openjdk version "11.0.2" 2019-01-15
OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mied mode)
elton not found... installing from [https://github.com/globalbioticinteractions/elton/releases/download/0.11.1/elton.jar]
elton version 0.11.1
Review of [local] started at [2021-09-02T19:00:07+00:00].
updating [local]... done.
creating review [local]... done.
listing interactions [local]... done.
listing taxa [local]... done.
listing nanopubs [local]... done.
Review of [globalbioticinteractions/pocock2012] included:
- 1148 interaction(s)
- 0 note(s)
- 1734 info(s)
Also see https://twitter.com/GlobalBiotic/status/1433517595475382278 and attached screenshot.
After digging through Dryad's back end, I see that this was a side effect of some restructuring we did to accommodate different classes of files. Apologies for the confusion. Although we take great pains to maintain the persistence of Dryad's DOIs, the intermediate URLs returned by our system occasionally change.
@ryscher thanks for providing the context in which the (intermediate) data URLs associated with Dryad DOIs were re-assigned to different, unrelated Dryad DOIs. I appreciate you took the time to respond. Also, I appreciate the great effort that you and your colleagues go through to keep the Dryad up and running.
Previously, I (incorrectly) assumed that data URL (i.e., https://datadryad.org/stash/downloads/file_stream/40842) to access the contents of the file "norwood.csv" would remain unchanged as part of the "persistent" DOI https://doi.org/10.5061/dryad.3s36r118 .
Now that I learned that URLs associated with a specific DOI may change, I am looking for more reliable methods to reference and retrieve specific (unaltered) digital data from Dryad. Is there, by any chance, a way to refer to, or retrieve, the data by their md5 / sha256 hash instead of some URL or name that may change?
Also, I am trying to understand the meaning of a "persistent" DOI. WIth a better understanding, I'd hopefully be able to implement a test to verify that a DOI is, in fact, persistent.
I'd be curious to hear suggestions on how to better resolve and retrieve specific data associated with a Dryad DOI.
via https://depot.globalbioticinteractions.org/reviews/globalbioticinteractions/pocock2012/README.txt