bio-guoda / preston

a biodiversity dataset tracker
MIT License
24 stars 1 forks source link

retrieving content by md5 hash from Zenodo fails unexpectedly for non-current version, but available version #210

Closed jhpoelen closed 1 year ago

jhpoelen commented 1 year ago

when attempting to reproduce #187 after resolving #207 , If found that

https://zenodo.org/api/records/?q=_files.checksum:%22md5:a64ba5bacf9b07197648a9eed660c176%22

produces:

{
  "aggregations": {
    "access_right": {
      "buckets": [
        {
          "doc_count": 1,
          "key": "open"
        }
      ],
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0
    },
    "file_type": {
      "buckets": [
        {
          "doc_count": 1,
          "key": ""
        },
        {
          "doc_count": 1,
          "key": "gz"
        },
        {
          "doc_count": 1,
          "key": "zip"
        }
      ],
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0
    },
    "keywords": {
      "buckets": [
        {
          "doc_count": 1,
          "key": "biology"
        },
        {
          "doc_count": 1,
          "key": "biotic assocations"
        },
        {
          "doc_count": 1,
          "key": "biotic interactions"
        },
        {
          "doc_count": 1,
          "key": "ecological informatics"
        },
        {
          "doc_count": 1,
          "key": "ecology"
        },
        {
          "doc_count": 1,
          "key": "species interactions"
        }
      ],
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0
    },
    "type": {
      "buckets": [
        {
          "doc_count": 1,
          "key": "dataset",
          "subtype": {
            "buckets": [],
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0
          }
        }
      ],
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0
    }
  },
  "hits": {
    "hits": [
      {
        "conceptdoi": "10.5281/zenodo.3950589",
        "conceptrecid": "3950589",
        "created": "2022-11-22T18:04:06.902272+00:00",
        "doi": "10.5281/zenodo.7348355",
        "files": [
          {
            "bucket": "633ae68e-57ee-4fbb-9c7d-7f50f9fa7597",
            "checksum": "md5:dfa6e0317ec8c3724b4c24a8472d8a0b",
            "key": "citations.csv.gz",
            "links": {
              "self": "https://zenodo.org/api/files/633ae68e-57ee-4fbb-9c7d-7f50f9fa7597/citations.csv.gz"
            },
            "size": 54731349,
            "type": "gz"
          },
          {
            "bucket": "633ae68e-57ee-4fbb-9c7d-7f50f9fa7597",
            "checksum": "md5:9e12daa5dd4bd730b6cb4223be808ba1",
            "key": "citations.tsv.gz",
            "links": {
              "self": "https://zenodo.org/api/files/633ae68e-57ee-4fbb-9c7d-7f50f9fa7597/citations.tsv.gz"
            },
            "size": 54676372,
            "type": "gz"
          },
          {
            "bucket": "633ae68e-57ee-4fbb-9c7d-7f50f9fa7597",
            "checksum": "md5:44008bc8d45a643875ecfbec88563d5c",
            "key": "dwca-by-study.zip",
            "links": {
              "self": "https://zenodo.org/api/files/633ae68e-57ee-4fbb-9c7d-7f50f9fa7597/dwca-by-study.zip"
            },
            "size": 365099325,
            "type": "zip"
          },
          {
            "bucket": "633ae68e-57ee-4fbb-9c7d-7f50f9fa7597",
            "checksum": "md5:81fb1af31f87a38c72c53d718680992f",
            "key": "dwca.zip",
            "links": {
              "self": "https://zenodo.org/api/files/633ae68e-57ee-4fbb-9c7d-7f50f9fa7597/dwca.zip"
            },
            "size": 509011240,
            "type": "zip"
          },
          {
            "bucket": "633ae68e-57ee-4fbb-9c7d-7f50f9fa7597",
            "checksum": "md5:d6aedbbce50932dd5de39a4aea92681d",
            "key": "interactions.csv.gz",
            "links": {
              "self": "https://zenodo.org/api/files/633ae68e-57ee-4fbb-9c7d-7f50f9fa7597/interactions.csv.gz"
            },
            "size": 1574446683,
            "type": "gz"
          },
          {
            "bucket": "633ae68e-57ee-4fbb-9c7d-7f50f9fa7597",
            "checksum": "md5:879e03c6905452281fc828a909bae85d",
            "key": "interactions.nq.gz",
            "links": {
              "self": "https://zenodo.org/api/files/633ae68e-57ee-4fbb-9c7d-7f50f9fa7597/interactions.nq.gz"
            },
            "size": 7357397872,
            "type": "gz"
          },
          {
            "bucket": "633ae68e-57ee-4fbb-9c7d-7f50f9fa7597",
            "checksum": "md5:e2dbe94e3c61570f3f123ebc07ea839d",
            "key": "interactions.tsv.gz",
            "links": {
              "self": "https://zenodo.org/api/files/633ae68e-57ee-4fbb-9c7d-7f50f9fa7597/interactions.tsv.gz"
            },
            "size": 1572302846,
            "type": "gz"
          },
          {
            "bucket": "633ae68e-57ee-4fbb-9c7d-7f50f9fa7597",
            "checksum": "md5:1b1e9c2a69fa4cc218943a526920ba5a",
            "key": "neo4j-graphdb.zip",
            "links": {
              "self": "https://zenodo.org/api/files/633ae68e-57ee-4fbb-9c7d-7f50f9fa7597/neo4j-graphdb.zip"
            },
            "size": 6214870166,
            "type": "zip"
          },
          {
            "bucket": "633ae68e-57ee-4fbb-9c7d-7f50f9fa7597",
            "checksum": "md5:a64ba5bacf9b07197648a9eed660c176",
            "key": "README",
            "links": {
              "self": "https://zenodo.org/api/files/633ae68e-57ee-4fbb-9c7d-7f50f9fa7597/README"
            },
            "size": 8468,
            "type": ""
          },
          {
            "bucket": "633ae68e-57ee-4fbb-9c7d-7f50f9fa7597",
            "checksum": "md5:092bdf870613ab2e6c9dff4ea4ec8aff",
            "key": "refuted-interactions.csv.gz",
            "links": {
              "self": "https://zenodo.org/api/files/633ae68e-57ee-4fbb-9c7d-7f50f9fa7597/refuted-interactions.csv.gz"
            },
            "size": 2801169,
            "type": "gz"
          },
          {
            "bucket": "633ae68e-57ee-4fbb-9c7d-7f50f9fa7597",
            "checksum": "md5:9035cf51f6dbc04245b26d39a1ed997e",
            "key": "refuted-interactions.tsv.gz",
            "links": {
              "self": "https://zenodo.org/api/files/633ae68e-57ee-4fbb-9c7d-7f50f9fa7597/refuted-interactions.tsv.gz"
            },
            "size": 2801134,
            "type": "gz"
          },
          {
            "bucket": "633ae68e-57ee-4fbb-9c7d-7f50f9fa7597",
            "checksum": "md5:b17d40a7e14a9ee030c77631121e27cc",
            "key": "refuted-verbatim-interactions.csv.gz",
            "links": {
              "self": "https://zenodo.org/api/files/633ae68e-57ee-4fbb-9c7d-7f50f9fa7597/refuted-verbatim-interactions.csv.gz"
            },
            "size": 575190,
            "type": "gz"
          },
          {
            "bucket": "633ae68e-57ee-4fbb-9c7d-7f50f9fa7597",
            "checksum": "md5:e6916ceba01a282c03beefa820019ce9",
            "key": "refuted-verbatim-interactions.tsv.gz",
            "links": {
              "self": "https://zenodo.org/api/files/633ae68e-57ee-4fbb-9c7d-7f50f9fa7597/refuted-verbatim-interactions.tsv.gz"
            },
            "size": 575079,
            "type": "gz"
          },
          {
            "bucket": "633ae68e-57ee-4fbb-9c7d-7f50f9fa7597",
            "checksum": "md5:fd53631e37fae04ff6d9a4b7379cf9a5",
            "key": "taxonCache.tsv.gz",
            "links": {
              "self": "https://zenodo.org/api/files/633ae68e-57ee-4fbb-9c7d-7f50f9fa7597/taxonCache.tsv.gz"
            },
            "size": 115606669,
            "type": "gz"
          },
          {
            "bucket": "633ae68e-57ee-4fbb-9c7d-7f50f9fa7597",
            "checksum": "md5:3f9545af3a468d52f5fd972112881af1",
            "key": "taxonMap.tsv.gz",
            "links": {
              "self": "https://zenodo.org/api/files/633ae68e-57ee-4fbb-9c7d-7f50f9fa7597/taxonMap.tsv.gz"
            },
            "size": 58713420,
            "type": "gz"
          },
          {
            "bucket": "633ae68e-57ee-4fbb-9c7d-7f50f9fa7597",
            "checksum": "md5:882dc7804e2e7d0bfde2fa4dfa612bb6",
            "key": "verbatim-interactions.csv.gz",
            "links": {
              "self": "https://zenodo.org/api/files/633ae68e-57ee-4fbb-9c7d-7f50f9fa7597/verbatim-interactions.csv.gz"
            },
            "size": 481573194,
            "type": "gz"
          },
          {
            "bucket": "633ae68e-57ee-4fbb-9c7d-7f50f9fa7597",
            "checksum": "md5:c6c3d93515cfec7f6c2ada06996dcdb2",
            "key": "verbatim-interactions.tsv.gz",
            "links": {
              "self": "https://zenodo.org/api/files/633ae68e-57ee-4fbb-9c7d-7f50f9fa7597/verbatim-interactions.tsv.gz"
            },
            "size": 479727505,
            "type": "gz"
          }
        ],
        "id": 7348355,
        "links": {
          "badge": "https://zenodo.org/badge/doi/10.5281/zenodo.7348355.svg",
          "bucket": "https://zenodo.org/api/files/633ae68e-57ee-4fbb-9c7d-7f50f9fa7597",
          "conceptbadge": "https://zenodo.org/badge/doi/10.5281/zenodo.3950589.svg",
          "conceptdoi": "https://doi.org/10.5281/zenodo.3950589",
          "doi": "https://doi.org/10.5281/zenodo.7348355",
          "html": "https://zenodo.org/record/7348355",
          "latest": "https://zenodo.org/api/records/7348355",
          "latest_html": "https://zenodo.org/record/7348355",
          "self": "https://zenodo.org/api/records/7348355"
        },
        "metadata": {
          "access_right": "open",
          "access_right_category": "success",
          "creators": [
            {
              "name": "GloBI Community"
            }
          ],
          "description": "<p>Global Biotic Interactions: Interpreted Data Products</p>\n\n<p>Global Biotic Interactions (GloBI, https://globalbioticinteractions.org, [1]) aims to facilitate access to existing species interaction records (e.g., predator-prey, plant-pollinator, virus-host). This data publication provides interpreted species interaction data products. These products are the result of a process in which versioned, existing species interaction datasets ([2]) are linked to the so-called GloBI Taxon Graph ([3]) and transformed into various aggregate formats (e.g., tsv, csv, neo4j, rdf/nquad, darwin core-ish archives). In addition, the applied name maps are included to make the applied taxonomic linking explicit.&nbsp;</p>\n\n<p>Citation<br>\n--------</p>\n\n<p>GloBI is made possible by researchers, collections, projects and institutions openly sharing their datasets. When using this data, please make sure to attribute these *original data contributors*, including citing the specific datasets in derivative work. Each species interaction record indexed by GloBI contains a reference and dataset citation. Also, a full lists of all references can be found in citations.csv/citations.tsv files in this publication. If you have ideas on how to make it easier to cite original datasets, please open/join a discussion via https://globalbioticinteractions.org or related projects.</p>\n\n<p>To credit GloBI for more easily finding interaction data, please use the following citation to reference GloBI:</p>\n\n<p>Jorrit H. Poelen, James D. Simons and Chris J. Mungall. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2014.08.005.</p>\n\n<p>Bias and Errors<br>\n--------</p>\n\n<p>As with any analysis and processing workflow, care should be taken to understand the bias and error propagation of data sources and related data transformation processes. The datasets indexed by GloBI are biased geospatially, temporally and taxonomically ([5], [6]). Also, mapping of verbatim names from datasets to known name concept may contains errors due to synonym mismatches, outdated names lists, typos or conflicting name authorities. Finally, bugs may introduce bias and errors in the resulting integrated data product.</p>\n\n<p>To help better understand where bias and errors are introduced, only versioned data and code are used as an input: the datasets ([2]), name maps ([3]) and integration software ([6]) are versioned so that the integration processes can be reproduced if needed. This way, steps take to compile an integrated data record can be traced and the sources of bias and errors can be more easily found.</p>\n\n<p>Contents<br>\n--------</p>\n\n<p>README:<br>\nthis file</p>\n\n<p>citations.csv.gz:<br>\ncontains data citations in a in a gzipped comma-separated values format.</p>\n\n<p>citations.tsv.gz:<br>\ncontains data citations in a gzipped tab-separated values format.</p>\n\n<p>verbatim-interactions.csv.gz<br>\ncontains species interactions tabulated as pair-wise interaction in a gzipped comma-separated values format. Included taxonomic name are *not* interpreted, but included as documented in their sources.</p>\n\n<p>verbatim-interactions.tsv.gz<br>\ncontains species interactions tabulated as pair-wise interaction in a gzipped tab-separated values format. Included taxonomic name are *not* interpreted, but included as documented in their sources.&nbsp;</p>\n\n<p>interactions.csv.gz:<br>\ncontains species interactions tabulated as pair-wise interactions in a gzipped comma-separated values format. Included taxonomic names are interpreted using taxonomic alignment workflows and may be different than those provided by the original sources.</p>\n\n<p>interactions.tsv.gz:<br>\ncontains species interactions tabulated as pair-wise interactions in a gzipped tab-separated values format. Included taxonomic names are interpreted using taxonomic alignment workflows and may be different than those provided by the original sources.</p>\n\n<p>refuted-interactions.csv.gz:<br>\ncontains refuted species interactions tabulated as pair-wise interactions in a gzipped comma-separated values format. Included taxonomic names are interpreted using taxonomic alignment workflows and may be different than those provided by the original sources.</p>\n\n<p>refuted-interactions.tsv.gz:<br>\ncontains refuted species interactions tabulated as pair-wise interactions in a gzipped tab-separated values format. Included taxonomic names are interpreted using taxonomic alignment workflows and may be different than those provided by the original sources.</p>\n\n<p>refuted-verbatim-interactions.csv.gz:<br>\ncontains refuted species interactions tabulated as pair-wise interactions in a gzipped comma-separated values format. Included taxonomic name are *not* interpreted, but included as documented in their sources.&nbsp;</p>\n\n<p>refuted-verbatim-interactions.tsv.gz:<br>\ncontains refuted species interactions tabulated as pair-wise interactions in a gzipped tab-separated values format. Included taxonomic name are *not* interpreted, but included as documented in their sources.&nbsp;</p>\n\n<p>interactions.nq.gz:<br>\ncontains species interactions expressed in the resource description framework in a gzipped rdf/quads format.</p>\n\n<p>dwca-by-study.zip:<br>\ncontains species interactions data as a Darwin Core Archive aggregated by study using a custom, occurrence level, association extension.</p>\n\n<p>dwca.zip:<br>\ncontains species interactions data as a Darwin Core Archive using a custom, occurrence level, association extension.</p>\n\n<p>neo4j-graphdb.zip:<br>\ncontains a neo4j v3.5.32 graph database snapshot containing a graph representation of the species interaction data.</p>\n\n<p>taxonCache.tsv.gz:<br>\ncontains hierarchies and identifiers associated with names from naming schemes in a gzipped tab-separated values format.</p>\n\n<p>taxonMap.tsv.gz:<br>\ndescribes how names in existing datasets were mapped into existing naming schemes in a gzipped tab-separated values format.</p>\n\n<p>References<br>\n-----</p>\n\n<p>[1] Jorrit H. Poelen, James D. Simons and Chris J. Mungall. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. doi: 10.1016/j.ecoinf.2014.08.005.</p>\n\n<p>[2] Poelen, J. H. (2020) Global Biotic Interactions: Elton Dataset Cache. Zenodo. doi: 10.5281/ZENODO.3950557.</p>\n\n<p>[3] Poelen, J. H. (2021). Global Biotic Interactions: Taxon Graph (Version 0.3.28) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.4451472</p>\n\n<p>[4] Hortal, J. et al. (2015) Seven Shortfalls that Beset Large-Scale Knowledge of Biodiversity. Annual Review of Ecology, Evolution, and Systematics, 46(1), pp.523&ndash;549. doi: 10.1146/annurev-ecolsys-112414-054400.</p>\n\n<p>[5] Cains, M. et al. (2017) Ivmooc 2017 - Gap Analysis Of Globi: Identifying Research And Data Sharing Opportunities For Species Interactions. Zenodo. Zenodo. doi: 10.5281/ZENODO.814978.</p>\n\n<p>[6] Poelen, J. et al. (2022) globalbioticinteractions/globalbioticinteractions v0.24.6. Zenodo. doi: 10.5281/ZENODO.7327955.</p>\n\n<p>Content References<br>\n-----</p>\n\n<p>hash://sha256/2e0158ca0b4341f4fa8ff454cf12bac2879b4e9d2d68e5e29b439af8ab467a30 &nbsp;citations.csv.gz<br>\nhash://sha256/42a8595ad8de2a32c52f56d632c2ef42a04e3645cb88b0ad328b1cedd2ac8f1a &nbsp;citations.tsv.gz<br>\nhash://sha256/fed873a314d91d09500c896c9831108358c0a47bb17b7ff8aebca5c2e170d508 &nbsp;interactions.csv.gz<br>\nhash://sha256/91db1a9fd55ddb584d888f6c6314adcd5a668462d0016be266aa3593f2f60884 &nbsp;interactions.tsv.gz<br>\nhash://sha256/b40229414a565ab68971d05754e5040eea4af27e3ac6ef6df383410ea2a64752 &nbsp;verbatim-interactions.csv.gz<br>\nhash://sha256/e1485d6b23db9f8989315334c1696d64e5b39c33147e3617b41944d0a5d8581d &nbsp;verbatim-interactions.tsv.gz<br>\nhash://sha256/4ed995d3a7d17b291f0a3af0a3fc50b41cb742d228a160f32e09f630a57563b0 &nbsp;refuted-interactions.csv.gz<br>\nhash://sha256/10564bdbc054b0e17ce78fe13d8e925c032c8484e599d4c245e70279d0f0e0bb &nbsp;refuted-interactions.tsv.gz<br>\nhash://sha256/fc0b9e23d1b026e223a7716c7dde0677d00f816fb356dd0e7238d827f5e051d8 &nbsp;refuted-verbatim-interactions.csv.gz<br>\nhash://sha256/b54a93b46bf4583a7a0e090a9ab5e53a8d5a6d9f10a4165934a7ab98ea6d88d9 &nbsp;refuted-verbatim-interactions.tsv.gz<br>\nhash://sha256/15cdbd8d6b6aac59500df664d5675e1d614fdcd1c2af165b950a7fe430dcd6a6 &nbsp;interactions.nq.gz<br>\nhash://sha256/ee0a810a54bb6c564de4beb9186a3b9d55201cb77697aa605783882c85adf9c8 &nbsp;dwca-by-study.zip<br>\nhash://sha256/7b1a034da65d6ecd0941ea93bfc104166a3ddb00e3af5c2d4f806e52ca92e5cc &nbsp;dwca.zip<br>\nhash://sha256/b3abdcfc5867ff6d8a5b7327c07bd6c2748d1f09efa06d63bd16a91447a4d97f &nbsp;neo4j-graphdb.zip<br>\nhash://sha256/e5b0a7990379d6e69404020ed48db9b0336443ff516a3dd99e3c9708eec74cf6 &nbsp;taxonMap.tsv.gz<br>\nhash://sha256/a5f7c0b4b718ebc7725cdac0502e2edee92ed164297880512e551bdf3d43f4ee &nbsp;taxonCache.tsv.gz<br>\n&nbsp;</p>",
          "doi": "10.5281/zenodo.7348355",
          "keywords": [
            "biotic assocations",
            "species interactions",
            "biotic interactions",
            "ecology",
            "biology",
            "ecological informatics"
          ],
          "license": {
            "id": "CC0-1.0"
          },
          "publication_date": "2022-11-22",
          "related_identifiers": [
            {
              "identifier": "10.1146/annurev-ecolsys-112414-054400",
              "relation": "cites",
              "scheme": "doi"
            },
            {
              "identifier": "10.1016/j.ecoinf.2014.08.005",
              "relation": "cites",
              "scheme": "doi"
            },
            {
              "identifier": "10.5281/ZENODO.814978",
              "relation": "cites",
              "scheme": "doi"
            },
            {
              "identifier": "10.5281/zenodo.5711446",
              "relation": "isDerivedFrom",
              "scheme": "doi"
            },
            {
              "identifier": "10.5281/zenodo.5711304",
              "relation": "isDerivedFrom",
              "scheme": "doi"
            },
            {
              "identifier": "10.5281/zenodo.5711396",
              "relation": "isDerivedFrom",
              "scheme": "doi"
            },
            {
              "identifier": "10.5281/zenodo.5711415",
              "relation": "isDerivedFrom",
              "scheme": "doi"
            },
            {
              "identifier": "10.5281/zenodo.5711816",
              "relation": "isDerivedFrom",
              "scheme": "doi"
            },
            {
              "identifier": "10.5281/zenodo.5711834",
              "relation": "isDerivedFrom",
              "scheme": "doi"
            },
            {
              "identifier": "10.5281/zenodo.5711875",
              "relation": "isDerivedFrom",
              "scheme": "doi"
            },
            {
              "identifier": "10.5281/zenodo.5526782",
              "relation": "isDerivedFrom",
              "scheme": "doi"
            },
            {
              "identifier": "10.5281/zenodo.3950589",
              "relation": "isVersionOf",
              "scheme": "doi"
            }
          ],
          "relations": {
            "version": [
              {
                "count": 5,
                "index": 4,
                "is_last": true,
                "last_child": {
                  "pid_type": "recid",
                  "pid_value": "7348355"
                },
                "parent": {
                  "pid_type": "recid",
                  "pid_value": "3950589"
                }
              }
            ]
          },
          "resource_type": {
            "title": "Dataset",
            "type": "dataset"
          },
          "title": "Global Biotic Interactions: Interpreted Data Products",
          "version": "0.5"
        },
        "owners": [
          7292
        ],
        "revision": 2,
        "stats": {
          "downloads": 89,
          "unique_downloads": 47,
          "unique_views": 12,
          "version_downloads": 17576,
          "version_unique_downloads": 2123,
          "version_unique_views": 434,
          "version_views": 544,
          "version_volume": 69264838699721,
          "views": 13,
          "volume": 114457739243
        },
        "updated": "2022-11-23T02:26:35.237812+00:00"
      }
    ],
    "total": 1
  },
  "links": {
    "self": "https://zenodo.org/api/records/?sort=bestmatch&q=_files.checksum%3A%22md5%3Aa64ba5bacf9b07197648a9eed660c176%22&page=1&size=10"
  }
}

however, an md5 hash query for an older version README for the GloBI Community. (2022). Global Biotic Interactions: Interpreted Data Products (0.5) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7348355 publication, does not return a result:

curl https://zenodo.org/api/records/?q=_files.checksum:%22md5:d11ddcecf3d5cbc627439260bdbfda72%22

yields:

{
  "aggregations": {
    "access_right": {
      "buckets": [],
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0
    },
    "file_type": {
      "buckets": [],
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0
    },
    "keywords": {
      "buckets": [],
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0
    },
    "type": {
      "buckets": [],
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0
    }
  },
  "hits": {
    "hits": [],
    "total": 0
  },
  "links": {
    "self": "https://zenodo.org/api/records/?sort=bestmatch&q=_files.checksum%3A%22md5%3Ad11ddcecf3d5cbc627439260bdbfda72%22&page=1&size=10"
  }
}

however, via https://zenodo.org/record/6604060 , I was able to see README with md5 hash (also see attached screenshot) md5:d11ddcecf3d5cbc627439260bdbfda72

But, I was able to download the file directly via:

$ curl --silent "https://zenodo.org/record/6604060/files/README" | md5sum
d11ddcecf3d5cbc627439260bdbfda72  -

@fyi cboettig

@slint is it expected that Zenodo only index md5 content hashes for the most recent versions of their content?

Screenshot from 2022-12-23 11-26-59

jhpoelen commented 1 year ago

see also https://github.com/zenodo/zenodo/issues/1985 .

jhpoelen commented 1 year ago

in https://github.com/zenodo/zenodo/issues/1985#issuecomment-1517492170 , @slint mentioned

@cboettig, @jhpoelen is correct regarding the API only searching the latest version of records by default. To search throughout all versions of records, you have to include the all_versions=true querystring parameter like so:

https://zenodo.org/api/records/?q=_files.checksum:%22md5:61b36a86930f6ffb073f4e189bbd5723%22&all_versions=1

The above returns the record containing the file with that hash.

After applying these suggestions, I was able to find the "older" README file with hash://md5/d11ddcecf3d5cbc627439260bdbfda72 referenced earlier in this issue:

preston cat --remote https://zenodo.org hash://md5/d11ddcecf3d5cbc627439260bdbfda72\
 | head

produced the expected

Global Biotic Interactions: Interpreted Data Products

Global Biotic Interactions (GloBI, https://globalbioticinteractions.org, [1]) aims to facilitate access to existing species interaction records (e.g., predator-prey, plant-pollinator, virus-host). This data publication provides interpreted species interaction data products. These products are the result of a process in which versioned, existing species interaction datasets ([2]) are linked to the so-called GloBI Taxon Graph ([3]) and transformed into various aggregate formats (e.g., tsv, csv, neo4j, rdf/nquad, darwin core-ish archives). In addition, the applied name maps are included to make the applied taxonomic linking explicit.

Citation

GloBI is made possible by researchers, collections, projects and institutions openly sharing their datasets. When using this data, please make sure to attribute these original data contributors, including citing the specific datasets in derivative work. Each species interaction record indexed by GloBI contains a reference and dataset citation. Also, a full lists of all references can be found in citations.csv/citations.tsv files in this publication. If you have ideas on how to make it easier to cite original datasets, please open/join a discussion via https://globalbioticinteractions.org or related projects.

To credit GloBI for more easily finding interaction data, please use the following citation to reference GloBI:

jhpoelen commented 1 year ago

Thank you @slint !