globalbioticinteractions / pocock2012

GloBI configuration to help index Pocock, Michael J. O.; Evans, Darren M.; Memmott, Jane (2012), Data from: The robustness and restoration of a network of ecological networks, Dryad, Dataset, https://doi.org/10.5061/dryad.3s36r118
0 stars 0 forks source link

review suggests malformed datadryad data file for The robustness and restoration of a network of ecological networks, Dryad, Dataset, https://doi.org/10.5061/dryad.3s36r118 #1

Open jhpoelen opened 3 years ago

jhpoelen commented 3 years ago

via https://depot.globalbioticinteractions.org/reviews/globalbioticinteractions/pocock2012/README.txt

 / ____| |     |  _ \_   _| |  __ \          (_)               
 | |  __| | ___ | |_) || |   | |__) |_____   ___  _____      __ 
 | | |_ | |/ _ \|  _ < | |   |  _  // _ \ \ / / |/ _ \ \ /\ / / 
 | |__| | | (_) | |_) || |_  | | \ \  __/\ V /| |  __/\ V  V /  
  \_____|_|\___/|____/_____| |_|  \_\___| \_/ |_|\___| \_/\_/   
 | |           |  ____| | |                                     
 | |__  _   _  | |__  | | |_ ___  _ __                          
 | '_ \| | | | |  __| | | __/ _ \| '_ \                         
 | |_) | |_| | | |____| | || (_) | | | |                        
 |_.__/ \__, | |______|_|\__\___/|_| |_|                        
         __/ |                                                  
        |___/                                                   

Review of [globalbioticinteractions/pocock2012] started at [2021-08-30T08:05:06+02:00].

Review of [globalbioticinteractions/pocock2012] included:
  - 0 interaction(s)
  - 654 note(s)
  - 0 info(s)

[globalbioticinteractions/pocock2012] has 654 reviewer note(s):
    255 source taxon name missing
    182 found [6] column definitions, but only [1] values: assuming undefined values are empty.
     42 found [6] column definitions, but only [2] values: assuming undefined values are empty.
     37 found [6] column definitions, but only [4] values: assuming undefined values are empty.
     31 found [6] column definitions, but only [3] values: assuming undefined values are empty.
     25 found [6] column definitions, but only [5] values: assuming undefined values are empty.
     15 found unsupported interaction type with name: [data=data]
      9 found unsupported interaction type with name: [z]
      6 missing interaction type
      6 found unsupported interaction type with name: [0.025)]
      6 found [7] columns, but only [6] columns are defined: ignoring remaining undefined columns.
      3 found unsupported interaction type with name: [whitenoise]
      3 found unsupported interaction type with name: []<- unlist(lapply(modelsquad]
      3 found unsupported interaction type with name: [t=t]
      3 found unsupported interaction type with name: [smpmeantheta=smpmeantheta]
      3 found unsupported interaction type with name: [grep("tfactor\\."]
      3 found unsupported interaction type with name: [eomegasq=eomegasq]
      3 found unsupported interaction type with name: [c(nrep]
      3 found unsupported interaction type with name: [2]
      3 found unsupported interaction type with name: [0.05]
      2 found unsupported interaction type with name: [inla.emarginal]
      2 found unsupported interaction type with name: [1/(omega^2+sigmaz)]
      2 found unsupported interaction type with name: [0.5]
      2 found unsupported interaction type with name: [0.25]
      1 no interactions found
      1 found unsupported interaction type with name: [SDbetamax]
      1 found unsupported interaction type with name: [round(sigma_beta]
      1 found [9] columns, but only [6] columns are defined: ignoring remaining undefined columns.
      1 found [11] columns, but only [6] columns are defined: ignoring remaining undefined columns.

If you'd like, you can generate your own review notes by:
  - installing GloBI's Elton via https://github.com/globalbioticinteractions/elton
  - running "elton update globalbioticinteractions/pocock2012 && elton review --type note,summary globalbioticinteractions/pocock2012 > review.tsv"
  - inspecting review.tsv

Please email info@globalbioticinteractions.org for questions/ comments.

This review generated the following resources:
https://depot.globalbioticinteractions.org/reviews/globalbioticinteractions/pocock2012/review.svg
https://depot.globalbioticinteractions.org/reviews/globalbioticinteractions/pocock2012/review.tsv.gz
https://depot.globalbioticinteractions.org/reviews/globalbioticinteractions/pocock2012/review-sample.tsv
https://depot.globalbioticinteractions.org/reviews/globalbioticinteractions/pocock2012/review-sample.csv
https://depot.globalbioticinteractions.org/reviews/globalbioticinteractions/pocock2012/indexed-interactions.tsv.gz
https://depot.globalbioticinteractions.org/reviews/globalbioticinteractions/pocock2012/indexed-interactions.csv.gz
https://depot.globalbioticinteractions.org/reviews/globalbioticinteractions/pocock2012/indexed-interactions-sample.tsv
https://depot.globalbioticinteractions.org/reviews/globalbioticinteractions/pocock2012/indexed-interactions-sample.csv
https://depot.globalbioticinteractions.org/reviews/globalbioticinteractions/pocock2012/indexed-names.tsv.gz
https://depot.globalbioticinteractions.org/reviews/globalbioticinteractions/pocock2012/indexed-names.csv.gz
https://depot.globalbioticinteractions.org/reviews/globalbioticinteractions/pocock2012/indexed-names-sample.tsv
https://depot.globalbioticinteractions.org/reviews/globalbioticinteractions/pocock2012/indexed-names-sample.csv
https://depot.globalbioticinteractions.org/reviews/globalbioticinteractions/pocock2012/indexed-citations.tsv.gz
https://depot.globalbioticinteractions.org/reviews/globalbioticinteractions/pocock2012/indexed-citations.csv.gz
https://depot.globalbioticinteractions.org/reviews/globalbioticinteractions/pocock2012/nanopub.ttl.gz
https://depot.globalbioticinteractions.org/reviews/globalbioticinteractions/pocock2012/nanopub-sample.ttl
https://depot.globalbioticinteractions.org/reviews/globalbioticinteractions/pocock2012/review.zip
jhpoelen commented 3 years ago

On close inspection, the content produced by configured URL changes around April 2021 for related resource :

Pocock, Michael J. O.; Evans, Darren M.; Memmott, Jane (2012), Data from: The robustness and restoration of a network of ecological networks, Dryad, Dataset, https://doi.org/10.5061/dryad.3s36r118

See attached pocock2012data.zip for content referenced by their hashes in the table below.

pocock2012data.zip

date/time content hash (sha256) data url
2021-04-03T02:09:47.214Z 49be776718febf8e73bd5f456244e215978d1e0860dbc0c0f3d484d3cc81f709 https://datadryad.org/stash/downloads/file_stream/40842
2021-04-10T01:15:38.495Z 49be776718febf8e73bd5f456244e215978d1e0860dbc0c0f3d484d3cc81f709 https://datadryad.org/stash/downloads/file_stream/40842
2021-04-17T02:05:06.846Z 8784b5d81674fc30ff7a1774416476075ceb9d0b051add58612b7215a4e52296 https://datadryad.org/stash/downloads/file_stream/40842
2021-04-24T01:43:03.361Z 8784b5d81674fc30ff7a1774416476075ceb9d0b051add58612b7215a4e52296 https://datadryad.org/stash/downloads/file_stream/40842
... ... ...
2021-08-28T00:48:27.715Z 8784b5d81674fc30ff7a1774416476075ceb9d0b051add58612b7215a4e52296 https://datadryad.org/stash/downloads/file_stream/40842

where originally, at least in 10 April 2021, https://datadryad.org/stash/downloads/file_stream/40842 produced content with first 10 lines being -

$ cat 49be776718febf8e73bd5f456244e215978d1e0860dbc0c0f3d484d3cc81f709 | head
lower guild,upper guild,lower taxon,upper taxon,estimated interaction strength,direct interaction
plant,flower visitor,Ajuga reptans,Rhingia campestris,1.492E+02,1
plant,flower visitor,Alliaria petiolata,Simulium sp,4.971E+01,1
plant,flower visitor,Anthriscus sylvestris,Agriotes pallidulus,1.969E+01,1
plant,flower visitor,Anthriscus sylvestris,Anaspis maculata,1.969E+01,1
plant,flower visitor,Anthriscus sylvestris,Anthocomus fasciatus,3.939E+01,1
plant,flower visitor,Anthriscus sylvestris,Bicellaria vana,1.969E+01,1
plant,flower visitor,Anthriscus sylvestris,Botanophila striolata,1.969E+01,1
plant,flower visitor,Anthriscus sylvestris,Bracon sp,3.939E+01,1
plant,flower visitor,Anthriscus sylvestris,Cantharis lateralis,3.939E+01,1
...

whereas later on 17 April 2021, https://datadryad.org/stash/downloads/file_stream/40842 produced -

$ cat 8784b5d81674fc30ff7a1774416476075ceb9d0b051add58612b7215a4e52296 | head
### Simulations used in Chevin, Visser, Tufto (Evolution) to test the statistical method.
## 3 sets of simulations below: (1) fluctuating optimum; (2) exponential fitness function with fluctuating rate; (3) constant optimum

#install.packages("INLA",repos="http://cran.r-project.org/")
library(INLA)
## to have more info on what the model is parameterized etc, do:  inla.doc("ar1")
setwd(getwd());

################ Simulations with fluctuating optimum  #############################
#parameters

related data download Dryad access page access via https://doi.org/10.5061/dryad.3s36r118 and resolved to https://datadryad.org/stash/dataset/doi:10.5061/dryad.3s36r118 -

Screenshot from 2021-09-02 11-44-50

where norwood.csv now is associated to https://datadryad.org/stash/downloads/file_stream/40321 , not https://datadryad.org/stash/downloads/file_stream/40842 .

and . . .

$ curl --silent -L "https://datadryad.org/stash/downloads/file_stream/40321" | sha256sum
49be776718febf8e73bd5f456244e215978d1e0860dbc0c0f3d484d3cc81f709  -

with content, including first 10 lines, being exactly the same as the old, pre-17 April 2021 version of the Pocock et al. 2021 data.

$ curl --silent -L "https://datadryad.org/stash/downloads/file_stream/40321" | head
lower guild,upper guild,lower taxon,upper taxon,estimated interaction strength,direct interaction
plant,flower visitor,Ajuga reptans,Rhingia campestris,1.492E+02,1
plant,flower visitor,Alliaria petiolata,Simulium sp,4.971E+01,1
plant,flower visitor,Anthriscus sylvestris,Agriotes pallidulus,1.969E+01,1
plant,flower visitor,Anthriscus sylvestris,Anaspis maculata,1.969E+01,1
plant,flower visitor,Anthriscus sylvestris,Anthocomus fasciatus,3.939E+01,1
plant,flower visitor,Anthriscus sylvestris,Bicellaria vana,1.969E+01,1
plant,flower visitor,Anthriscus sylvestris,Botanophila striolata,1.969E+01,1
plant,flower visitor,Anthriscus sylvestris,Bracon sp,3.939E+01,1
plant,flower visitor,Anthriscus sylvestris,Cantharis lateralis,3.939E+01,1

This is an example of content drift - the original url no longer produces the original content, but is replaced with some other data. In this case the other data appears to be an R script from a completely different data dryad data publication.

jhpoelen commented 3 years ago

After updating the data url to https://datadryad.org/stash/downloads/file_stream/40321 via https://github.com/globalbioticinteractions/pocock2012/commit/be57f1d26a208fb4b2adfad72e17a3b88abe5bf9, the elton review looked more favorable, indicating that interaction data was successfully indexed:

  _____ _       ____ _____   _____            _                
  / ____| |     |  _ \_   _| |  __ \          (_)               
 | |  __| | ___ | |_) || |   | |__) |_____   ___  _____      __ 
 | | |_ | |/ _ \|  _ < | |   |  _  // _ \ \ / / |/ _ \ \ /\ / / 
 | |__| | | (_) | |_) || |_  | | \ \  __/\ V /| |  __/\ V  V /  
  \_____|_|\___/|____/_____| |_|  \_\___| \_/ |_|\___| \_/\_/   
 | |           |  ____| | |                                     
 | |__  _   _  | |__  | | |_ ___  _ __                          
 | '_ \| | | | |  __| | | __/ _ \| '_ \                         
 | |_) | |_| | | |____| | || (_) | | | |                        
 |_.__/ \__, | |______|_|\__\___/|_| |_|                        
         __/ |                                                  
        |___/                                                   

Miller 3.4.0
s3cmd version 2.1.0
openjdk version "11.0.2" 2019-01-15

OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mied mode)
elton not found... installing from [https://github.com/globalbioticinteractions/elton/releases/download/0.11.1/elton.jar]

elton version 0.11.1

Review of [local] started at [2021-09-02T19:00:07+00:00].

updating [local]... done.
creating review [local]... done.
listing interactions [local]... done.
listing taxa [local]... done.
listing nanopubs [local]... done.

Review of [globalbioticinteractions/pocock2012] included:
  - 1148 interaction(s)
  - 0 note(s)
  - 1734 info(s)
jhpoelen commented 3 years ago

Also see https://twitter.com/GlobalBiotic/status/1433517595475382278 and attached screenshot.

Screenshot from 2021-09-02 12-52-31

ryscher commented 3 years ago

After digging through Dryad's back end, I see that this was a side effect of some restructuring we did to accommodate different classes of files. Apologies for the confusion. Although we take great pains to maintain the persistence of Dryad's DOIs, the intermediate URLs returned by our system occasionally change.

jhpoelen commented 3 years ago

@ryscher thanks for providing the context in which the (intermediate) data URLs associated with Dryad DOIs were re-assigned to different, unrelated Dryad DOIs. I appreciate you took the time to respond. Also, I appreciate the great effort that you and your colleagues go through to keep the Dryad up and running.

Previously, I (incorrectly) assumed that data URL (i.e., https://datadryad.org/stash/downloads/file_stream/40842) to access the contents of the file "norwood.csv" would remain unchanged as part of the "persistent" DOI https://doi.org/10.5061/dryad.3s36r118 .

Now that I learned that URLs associated with a specific DOI may change, I am looking for more reliable methods to reference and retrieve specific (unaltered) digital data from Dryad. Is there, by any chance, a way to refer to, or retrieve, the data by their md5 / sha256 hash instead of some URL or name that may change?

Also, I am trying to understand the meaning of a "persistent" DOI. WIth a better understanding, I'd hopefully be able to implement a test to verify that a DOI is, in fact, persistent.

I'd be curious to hear suggestions on how to better resolve and retrieve specific data associated with a Dryad DOI.