datadryad / dryad-product-roadmap

Repository of issues for Dryad project boards
https://github.com/orgs/datadryad/projects
8 stars 0 forks source link

InvenioRDM mismatch in uploaded files #2750

Closed sfisher closed 1 year ago

sfisher commented 1 year ago

We had tested simple cases earlier but now it seems to be failing. To test, upload one software file with initial submission.

After processing happens for the zenodo queue it gets the error:

2023-08-25 10:44:54 -0700 Stash::ZenodoSoftware::FileError
P046066-102947.jpg (id: 14654) exists in the Dryad database but not in Zenodo after Zenodo indicated a successful upload
The number of Dryad files (1) does not match the number of Zenodo files (0)

This is part of the validation we do after a successful upload. IDK if it's a temporary or permanent problem or perhaps invenio doesn't add files immediately or something else?

sfisher commented 1 year ago

I've tracked this down to possibly one thing on our side and also a change in the responses they are giving us to some of the same API calls.

It seems that the deposition_ids used to be an integer but now they are strings of some type, so we may need to change the field that stores that in our database to take a string instead, so it is stored properly.

However, it also seems that they have changed the items returned as part of a json response between Zenodo and InvenioRDM.

Example of the old response from https://sandbox.zenodo.org/api/deposit/depositions/1235478 . Of note is that this includes the files collection in the response along with the checksum and filenames for each file.

{
  conceptrecid: "1235477",
  created: "2023-08-28T21:58:49.650034+00:00",
  doi: "10.5072/zenodo.1235478",
  doi_url: "https://doi.org/10.5072/zenodo.1235478",
  files: [
    {
      checksum: "5bb27a94bfc3798a94c6dde14d7ecdb0",
      filename: "P045593-623064.jpg",
      filesize: 2017598,
      id: "7e2624ae-4806-4489-a523-68e0c4c122b2",
      links: {
        download: "https://sandbox.zenodo.org/api/files/c55bbb99-c01b-4fc5-875f-aa98f82bb908/P045593-623064.jpg",
        self: "https://sandbox.zenodo.org/api/deposit/depositions/1235478/files/7e2624ae-4806-4489-a523-68e0c4c122b2"
      }
    },
    {
      checksum: "80d879afb7e51ecf8ca07de3e78b639f",
      filename: "P045593-879726.jpg",
      filesize: 1011309,
      id: "5a28b7da-60ec-45c0-a11d-c2e4fe72b14e",
      links: {
        download: "https://sandbox.zenodo.org/api/files/c55bbb99-c01b-4fc5-875f-aa98f82bb908/P045593-879726.jpg",
        self: "https://sandbox.zenodo.org/api/deposit/depositions/1235478/files/5a28b7da-60ec-45c0-a11d-c2e4fe72b14e"
      }
    }
  ],
  id: 1235478,
  links: {
    badge: "https://sandbox.zenodo.org/badge/doi/10.5072/zenodo.1235478.svg",
    bucket: "https://sandbox.zenodo.org/api/files/c55bbb99-c01b-4fc5-875f-aa98f82bb908",
    discard: "https://sandbox.zenodo.org/api/deposit/depositions/1235478/actions/discard",
    doi: "https://doi.org/10.5072/zenodo.1235478",
    edit: "https://sandbox.zenodo.org/api/deposit/depositions/1235478/actions/edit",
    files: "https://sandbox.zenodo.org/api/deposit/depositions/1235478/files",
    html: "https://sandbox.zenodo.org/deposit/1235478",
    latest_draft: "https://sandbox.zenodo.org/api/deposit/depositions/1235478",
    latest_draft_html: "https://sandbox.zenodo.org/deposit/1235478",
    newversion: "https://sandbox.zenodo.org/api/deposit/depositions/1235478/actions/newversion",
    publish: "https://sandbox.zenodo.org/api/deposit/depositions/1235478/actions/publish",
    registerconceptdoi: "https://sandbox.zenodo.org/api/deposit/depositions/1235478/actions/registerconceptdoi",
    self: "https://sandbox.zenodo.org/api/deposit/depositions/1235478"
  },
  metadata: {
    access_right: "open",
    communities: [
      {
        identifier: "dryad"
      }
    ],
    creators: [
      {
        affiliation: "Nicolae Testemițanu State University of Medicine and Pharmacy",
        name: "Account, Testing",
        orcid: "0000-0002-4734-4551"
      }
    ],
    description: "<p>beves</p>",
    doi: "10.5072/zenodo.1235478",
    keywords: [
      "sebastian",
      "lulie",
      "Mga"
    ],
    license: "MIT",
    notes: "<p>Funding provided by: Fundación Prevent<br>Crossref Funder Registry ID: http://dx.doi.org/10.13039/100017082<br>Award Number: 338973</p>",
    prereserve_doi: {
      doi: "10.5072/zenodo.1235478",
      recid: 1235478
    },
    publication_date: "2023-08-29",
    related_identifiers: [
      {
        identifier: "10.7959/dryad.brv15dv5m",
        relation: "isSourceOf",
        scheme: "doi"
      }
    ],
    title: "Fun deletion",
    upload_type: "software"
  },
  modified: "2023-08-29T22:16:30.390401+00:00",
  owner: 33450,
  record_id: 1235478,
  state: "unsubmitted",
  submitted: false,
  title: "Fun deletion"
}

The new response from InvenioRDM at https://zenodo-rdm.web.cern.ch/api/deposit/depositions/d9f5q-gpp85 which appears to have a file at https://zenodo-rdm.web.cern.ch/uploads/d9f5q-gpp85 and note that the files collection is an empty array in this API.

{
  files: [ ],
  conceptrecid: "4x0aa-rja27",
  id: "d9f5q-gpp85",
  links: {
    self: "https://zenodo-rdm.web.cern.ch/api/deposit/depositions/d9f5q-gpp85",
    html: "https://zenodo-rdm.web.cern.ch/deposit/d9f5q-gpp85",
    files: "https://zenodo-rdm.web.cern.ch/api/deposit/depositions/d9f5q-gpp85/files",
    bucket: "https://zenodo-rdm.web.cern.ch/api/files/9b4d4bc9-50a5-4cd2-9f9a-2c4c97a3788b",
    latest_draft: "https://zenodo-rdm.web.cern.ch/api/deposit/depositions/d9f5q-gpp85",
    publish: "https://zenodo-rdm.web.cern.ch/api/deposit/depositions/d9f5q-gpp85/actions/publish",
    edit: "https://zenodo-rdm.web.cern.ch/api/deposit/depositions/d9f5q-gpp85/actions/edit",
    discard: "https://zenodo-rdm.web.cern.ch/api/deposit/depositions/d9f5q-gpp85/actions/discard"
  },
  record_id: "d9f5q-gpp85",
  modified: "2023-09-12T22:32:17.690634+00:00",
  created: "2023-09-12T22:32:13.994504+00:00",
  title: "Test InvenioRDM simple 001",
  owner: 90070,
  metadata: {
    related_identifiers: [
      {
        identifier: "10.7959/dryad.rr4xgxj1",
        relation: "isSourceOf",
        scheme: "doi"
      }
    ],
    imprint_publisher: "Zenodo",
    access_right: "open",
    title: "Test InvenioRDM simple 001",
    license: "mit-license",
    publication_date: "2023-09-13",
    description: "<p>This tests the basic functionality of Zenodo software deposit.</p>",
    communities: [
      {
        identifier: "dryad"
      }
    ],
    creators: [
      {
        affiliation: "National Maternity Hospital",
        name: "Account, Testing",
        orcid: "0000-0002-4734-4551"
      }
    ],
    notes: "<p>Funding provided by: Snowdome Foundation<br>Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100021839<br>Award Number: 3234</p>",
    upload_type: "software",
    keywords: [
      "Democracy",
      "RNA probes",
      "Proline"
    ],
    prereserve_doi: {
      doi: "10.5281/zenodo.d9f5q-gpp85",
      recid: "d9f5q-gpp85"
    }
  },
  state: "unsubmitted",
  submitted: false
}
sfisher commented 1 year ago

I've emailed Alex about this and their cutover date. I just saw this blog post linked from their web site. https://blog.zenodo.org/2023/09/06/2023-09-06-zenodo-rdm/ Their advertised cutover date is Sept 29th so one of us will need to resolve the issue before that date (either make the new API the same as the old or else we may need to use other file information requests to get that information).

sfisher commented 1 year ago

Alex is having us reconfigure the system to use a different server.

Now I'm getting a different error:

{"message"=>"Referer checking failed - no Referer.", "status"=>400} for 
http.post https://zenodo-rdm-qa.web.cern.ch/api/deposit/depositions
{"message"=>"Referer checking failed - no Referer.", "status"=>400}

Checking in with Alex to see if expected behavior and what we would need to add to our API requests if this is expected.

sfisher commented 1 year ago

This problem is because of something specific to that server. Alex is directing us to a different one.