eblondel / zen4R

zen4R - R Interface to Zenodo REST API
https://github.com/eblondel/zen4R/wiki
Other
44 stars 14 forks source link

Issues with ZenodoManager$depositRecordVersion(): API access / record information lost #142

Closed ablaette closed 4 months ago

ablaette commented 10 months ago

I wish to use the functionality to deposit a new version for a record using the method $depositRecordVersion() of the ZenodoManager class. The short explanation in the vignette is straight-forward and great. I greatly appreciate your systematic work to expose the abilities of the API. It has a great potential for our workflows, but here is a set of issues I encountered.

This is initial sample code I used.

library(zen4R) # I use 0.9.9000 from branch '126-zenodo-invenio-rdm'

zenodo <- ZenodoManager$new(
  token = Sys.getenv("ZENODO_ACCESS_TOKEN"),  # available via .Renviron
  logger = "DEBUG" # or "INFO"
)
myrec <- zenodo$getDepositionByDOI("10.5281/zenodo.7949074") # latest deposition of GermaParl corpus

# some modifications
myrec$setVersion("v2.0.1")
myrec$setPublicationDate(Sys.Date()) 
myrec$prereserveDOI(FALSE) # necessary?

myrec2 <- zenodo$depositRecordVersion(
  myrec,
  delete_latest_files = TRUE,
  publish = FALSE
)

Resulting in:

[zen4R][INFO] ZenodoManager - Creating new version for record '7949074/versions/latest' (concept DOI: '10.5281/zenodo.3735140') -> POST /api/deposit/depositions/7949074/versions/latest/actions/newversion HTTP/1.1 -> Host: zenodo.org -> Accept-Encoding: deflate, gzip -> Cookie: 5569e5a730cade8ff2b54f1e815f3670=55bec0eaecfcbf5692fa89fa4aad17e3 -> Accept: application/json, text/xml, application/xml, / -> User-Agent: zen4R_0.9.9000 -> Content-Type: application/json -> Authorization: -> Content-Length: 2 ->

{}

<- HTTP/1.1 404 NOT FOUND <- server: nginx <- date: Thu, 21 Dec 2023 07:27:14 GMT <- content-type: application/json <- transfer-encoding: chunked <- vary: Accept-Encoding <- access-control-allow-origin: <- access-control-expose-headers: Content-Type, ETag, Link, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset <- permissions-policy: interest-cohort=() <- x-frame-options: sameorigin <- x-xss-protection: 1; mode=block <- x-content-type-options: nosniff <- content-security-policy: default-src 'self' fonts.googleapis.com .gstatic.com data: 'unsafe-inline' 'unsafe-eval' blob: zenodo-broker.web.cern.ch zenodo-broker-qa.web.cern.ch maxcdn.bootstrapcdn.com cdnjs.cloudflare.com ajax.googleapis.com webanalytics.web.cern.ch <- strict-transport-security: max-age=31556926; includeSubDomains <- referrer-policy: strict-origin-when-cross-origin <- set-cookie: csrftoken=eyJhbGciOiJIUzUxMiIsImlhdCI6MTcwMzE0MzYzNCwiZXhwIjoxNzAzMjMwMDM0fQ.ImtPSTJuZTQ4RzV0a1k2SzgxdUpkdFdHUDVrdXlFcmFyIg.Dk2xBvMMQ_shaBiG1ObSjYpr1LyBZ1fkVgcfR-kQVDiJUBP72HPWOwswXpfOVmvgrQvtCCUGKg0fR4X7Q4pjXw; Expires=Thu, 28 Dec 2023 07:27:14 GMT; Max-Age=604800; Secure; Path=/; SameSite=Lax <- content-encoding: gzip <- [zen4R][ERROR] ZenodoManager - Error while creating new version: The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.

Based on the documentation, I explored the API with curl from the Terminal and I realized that the API does not accept " /api/deposit/depositions/7949074/versions/latest/actions". I can modify the API call as follows:

myrec$links$latest <- "https://zenodo.org/api/records/7949074"

Running the code with the HACK on the link of the latest version ...

library(zen4R)
library(fs)
library(magrittr)

zenodo <- ZenodoManager$new(
  token = Sys.getenv("ZENODO_ACCESS_TOKEN"), 
  logger = "DEBUG" # or "INFO"
)
myrec <- zenodo$getDepositionByDOI("10.5281/zenodo.6546810")

myrec$setVersion("v2.0.0")
myrec$setPublicationDate(Sys.Date()) 
myrec$prereserveDOI(FALSE)

myrec$links$latest <- "https://zenodo.org/api/records/6546810" # !!!!!! HACK !!!!

myrec2 <- zenodo$depositRecordVersion(
  myrec,
  delete_latest_files = TRUE,
  publish = FALSE
)

... now yields:

[zen4R][INFO] ZenodoManager - Creating new version for record '6546810' (concept DOI: '10.5281/zenodo.3822638') -> POST /api/deposit/depositions/6546810/actions/newversion HTTP/1.1 -> Host: zenodo.org -> Accept-Encoding: deflate, gzip -> Cookie: 5569e5a730cade8ff2b54f1e815f3670=55bec0eaecfcbf5692fa89fa4aad17e3; csrftoken=eyJhbGciOiJIUzUxMiIsImlhdCI6MTcwMzE0MzYzNCwiZXhwIjoxNzAzMjMwMDM0fQ.ImtPSTJuZTQ4RzV0a1k2SzgxdUpkdFdHUDVrdXlFcmFyIg.Dk2xBvMMQ_shaBiG1ObSjYpr1LyBZ1fkVgcfR-kQVDiJUBP72HPWOwswXpfOVmvgrQvtCCUGKg0fR4X7Q4pjXw -> Accept: application/json, text/xml, application/xml, / -> User-Agent: zen4R_0.9.9000 -> Content-Type: application/json -> Authorization: -> Content-Length: 2 ->

{}

<- HTTP/1.1 201 CREATED <- server: nginx <- date: Thu, 21 Dec 2023 07:32:03 GMT <- content-type: application/json <- content-length: 3970 <- etag: "7" <- x-ratelimit-limit: 1000 <- x-ratelimit-remaining: 995 <- x-ratelimit-reset: 1703143983 <- retry-after: 59 <- permissions-policy: interest-cohort=() <- x-frame-options: sameorigin <- x-xss-protection: 1; mode=block <- x-content-type-options: nosniff <- content-security-policy: default-src 'self' fonts.googleapis.com .gstatic.com data: 'unsafe-inline' 'unsafe-eval' blob: zenodo-broker.web.cern.ch zenodo-broker-qa.web.cern.ch maxcdn.bootstrapcdn.com cdnjs.cloudflare.com ajax.googleapis.com webanalytics.web.cern.ch <- strict-transport-security: max-age=31556926; includeSubDomains <- referrer-policy: strict-origin-when-cross-origin <- access-control-allow-origin: <- access-control-expose-headers: Content-Type, ETag, Link, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset <- strict-transport-security: max-age=15768000 <- x-request-id: 06230b16baa2fc5b6b4429f5bdc4af0a <- [zen4R][INFO] ZenodoRequest - Fetching https://zenodo.org/api/user/records?q=recid:10417081&size=10&page=1&allversions=1 -> GET /api/user/records?q=recid:10417081&size=10&page=1&allversions=1 HTTP/1.1 -> Host: zenodo.org -> Accept-Encoding: deflate, gzip -> Cookie: 5569e5a730cade8ff2b54f1e815f3670=55bec0eaecfcbf5692fa89fa4aad17e3; csrftoken=eyJhbGciOiJIUzUxMiIsImlhdCI6MTcwMzE0MzYzNCwiZXhwIjoxNzAzMjMwMDM0fQ.ImtPSTJuZTQ4RzV0a1k2SzgxdUpkdFdHUDVrdXlFcmFyIg.Dk2xBvMMQ_shaBiG1ObSjYpr1LyBZ1fkVgcfR-kQVDiJUBP72HPWOwswXpfOVmvgrQvtCCUGKg0fR4X7Q4pjXw -> Accept: application/json, text/xml, application/xml, / -> User-Agent: zen4R_0.9.9000 -> Authorization: -> <- HTTP/1.1 200 OK <- server: nginx <- date: Thu, 21 Dec 2023 07:32:03 GMT <- content-type: application/json <- transfer-encoding: chunked <- vary: Accept-Encoding <- x-ratelimit-limit: 1000 <- x-ratelimit-remaining: 994 <- x-ratelimit-reset: 1703143984 <- retry-after: 60 <- permissions-policy: interest-cohort=() <- x-frame-options: sameorigin <- x-xss-protection: 1; mode=block <- x-content-type-options: nosniff <- content-security-policy: default-src 'self' fonts.googleapis.com .gstatic.com data: 'unsafe-inline' 'unsafe-eval' blob: zenodo-broker.web.cern.ch zenodo-broker-qa.web.cern.ch maxcdn.bootstrapcdn.com cdnjs.cloudflare.com ajax.googleapis.com webanalytics.web.cern.ch <- strict-transport-security: max-age=31556926; includeSubDomains <- referrer-policy: strict-origin-when-cross-origin <- access-control-allow-origin: <- access-control-expose-headers: Content-Type, ETag, Link, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset <- strict-transport-security: max-age=15768000 <- x-request-id: 19359d63d4648134c8f1bb8cd53ca62c <- content-encoding: gzip <- [zen4R][INFO] ZenodoManager - Successfully fetched list of depositions (user records)! [zen4R][WARN] ZenodoManager - No record for id '10417081'! [zen4R][INFO] ZenodoManager - Successful new version record created for concept DOI '10.5281/zenodo.3822638' -> POST /api/records HTTP/1.1 -> Host: zenodo.org -> Accept-Encoding: deflate, gzip -> Cookie: 5569e5a730cade8ff2b54f1e815f3670=55bec0eaecfcbf5692fa89fa4aad17e3; csrftoken=eyJhbGciOiJIUzUxMiIsImlhdCI6MTcwMzE0MzYzNCwiZXhwIjoxNzAzMjMwMDM0fQ.ImtPSTJuZTQ4RzV0a1k2SzgxdUpkdFdHUDVrdXlFcmFyIg.Dk2xBvMMQ_shaBiG1ObSjYpr1LyBZ1fkVgcfR-kQVDiJUBP72HPWOwswXpfOVmvgrQvtCCUGKg0fR4X7Q4pjXw -> Accept: application/json, text/xml, application/xml, / -> User-Agent: zen4R_0.9.9000 -> Content-Type: application/json -> Authorization: -> Content-Length: 2313 ->

{ "stats": [ { "downloads": 25895, "unique_downloads": 16230, "views": 9258, "unique_views": 7139, "version_downloads": 1647, "version_unique_downloads": 1388, "version_unique_views": 3504, "version_views": 4547 } ], "revision": 2, "submitted": true, "state": "done", "status": "published", "recid": "6546810", "owners": [ { "id": 80803 } ], "modified": "2022-05-14T01:50:11.263852+00:00", "metadata": { "title": "GermaParl Sample Corpus", "publication_date": "2023-12-21", "description": "

The GermaParlSample Corpus is a small subset of the GermaParl corpus that has been prepared in the PolMine Project (http://polmine.github.io). The intended usage of the sample corpus is to explore the data format that has been linguistically annotated (using the TreeTagger) and imported into the Corpus Workbench (CWB), and to test functionality for automatic data retrieval from Zenodo. See the GermaParl documentation website (http://polmine.github.io/GermaParl) for further information.<\/p>\n\n

The purpose of GermaParlSample is to have a lightweight resource at Zenodo for testing purposes. The only reason why access to GermaParlSample v0.1.1 is limited is to have a version with restricted access, so that required cookies can be tested in the test suite of the R package 'cwbtools'. If you do not have access, you are not missing anything, v0.1.1 is identical with v0.1.0.<\/p>", "access_right": "restricted", "creators": [ { "name": "Blätte, Andreas", "affiliation": "University of Duisburg-Essen", "orcid": "0000-0001-8970-8010" } ], "keywords": [ "corpus, Bundestag, parliamentary protocols" ], "version": "v2.0.0", "resource_type": { "title": "Dataset", "type": "dataset" }, "relations": { "version": [ { "index": 2, "is_last": true, "parent": { "pid_type": "recid", "pid_value": "3822638" } } ] }, "prereserve_doi": true }, "doi_url": "https://doi.org/10.5281/zenodo.6546810", "created": "2022-05-13T15:41:00.568217+00:00", "conceptrecid": "3822638", "conceptdoi": "10.5281/zenodo.3822638" }

<- HTTP/1.1 201 CREATED <- server: nginx <- date: Thu, 21 Dec 2023 07:32:03 GMT <- content-type: application/json <- content-length: 3000 <- etag: "4" <- x-ratelimit-limit: 1000 <- x-ratelimit-remaining: 993 <- x-ratelimit-reset: 1703143984 <- retry-after: 60 <- permissions-policy: interest-cohort=() <- x-frame-options: sameorigin <- x-xss-protection: 1; mode=block <- x-content-type-options: nosniff <- content-security-policy: default-src 'self' fonts.googleapis.com .gstatic.com data: 'unsafe-inline' 'unsafe-eval' blob: zenodo-broker.web.cern.ch zenodo-broker-qa.web.cern.ch maxcdn.bootstrapcdn.com cdnjs.cloudflare.com ajax.googleapis.com webanalytics.web.cern.ch <- strict-transport-security: max-age=31556926; includeSubDomains <- referrer-policy: strict-origin-when-cross-origin <- access-control-allow-origin: <- access-control-expose-headers: Content-Type, ETag, Link, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset <- strict-transport-security: max-age=15768000 <- x-request-id: b60d3bf03fe478f61fb79c4c27272137 <- [zen4R][INFO] ZenodoManager - Successful record deposition [zen4R][INFO] ZenodoManager - Deleting files copied from latest record

So zen4R is not conforming to the latest development of the API?

So I was able to overcome this issue, but then I realized that significant parts of the record metadata is lost with the new record, such as: Resource type, Creator, language, keywords, communities.

Finally, I can use ZenodoManager$uploadFile() for small files, but there is a rate limit for larger files that I cannot overcome for my 1-3 GB files.

So I find the functionality of zen4R very, very useful, but I cannot use it as of now, unfortunately.

eblondel commented 10 months ago

See #127 and specifically for record version deposition see #140

eblondel commented 10 months ago

The Zenodo team has informed that the new API release is still not stable. Huge part of the migration has been done in #127 with a specific dev branch under work, but there are still missing parts.

ablaette commented 10 months ago

Thanks a lot for your quick reply and my apologies that I failed to see #140. I understand the challenge you explain - so I will leave it with stating that I find your work incredibly useful. zen4R is a crucial building block for our RDM!!

eblondel commented 4 months ago

Method has been migrated through #133