IQSS / dataverse-client-r

R Client for Dataverse Repositories
https://iqss.github.io/dataverse-client-r
60 stars 24 forks source link

add_dataset_file error #116

Closed Danny-dK closed 2 years ago

Danny-dK commented 2 years ago

In a previous issue https://github.com/IQSS/dataverse-client-r/issues/82#issue-806197206 I indicated issues with adding dataset files. I posted the code that was working https://github.com/IQSS/dataverse-client-r/issues/82#issuecomment-788738907

After many months I'm now trying to run the exact same code, but I'm now getting an error at the last step of adding a datasetfile using add_dataset_file(). The error thrown is Bad Request (HTTP 400). Failed to Error in parsing provided json. Everything else up to that point seems to work, including the creation of the dataset and the retrieval of the doi, just uploading files seems to fail. Does anyone know what may changed between march 2021 and now?

And again the CURL command in R works


headers = c(
  `X-Dataverse-key` = 'xxxxxxxxxxxxxxxxxxxxxxxxxxx'
)

params = list(
  `persistentId` = 'doi:10.80227/test-YDCZ1J',
  `version` = 'DRAFT'
)

files = list(
  `file` = upload_file('D:/parttwo.txt')
)

res <- httr::POST(url = 'https://demo.dataverse.nl/api/datasets/:persistentId/add', httr::add_headers(.headers=headers), 
                  query = params, body = files)

Windows 10 R 4.1.2. Rstudio 2021.9.1.372 Dataverse 0.3.10

Danny-dK commented 2 years ago

It seems to be an error at creating description? If I add description to the add_dataset_file() it works.

So this does not work:

f <- add_dataset_file(dataset = 'doi:10.80227/test-YDCZ1J&version=DRAFT', file = 'D:/partone.txt')

But this does work:

f <- add_dataset_file(dataset = 'doi:10.80227/test-YDCZ1J&version=DRAFT', file = 'D:/partone.txt', description = 'text')

Within the function off add_dataset_file() I see:

function (file, dataset, description = NULL, key = Sys.getenv("DATAVERSE_KEY"), 
  server = Sys.getenv("DATAVERSE_SERVER"), ...) 
{
  dataset <- dataset_id(dataset, key = key, server = server, 
    ...)
  bod2 <- list()
  if (!is.null(description)) {
    bod2$description <- description
  }
  jsondata <- as.character(jsonlite::toJSON(bod2, auto_unbox = TRUE))
  u <- paste0(api_url(server), "datasets/", dataset, "/add")
  r <- httr::POST(u, httr::add_headers(`X-Dataverse-key` = key), 
    ..., body = list(file = httr::upload_file(file), jsonData = jsondata), 
    encode = "multipart")
  httr::stop_for_status(r, task = httr::content(r)$message)
  out <- jsonlite::fromJSON(httr::content(r, "text", encoding = "UTF-8"))
  out$data$files$dataFile$id[1L]
}

Could it be that when description is not present (and therefore bod2 is empty), that it results in an empty jsondata [] which might not be accepted at the httr::POST() section?

kuriwaki commented 2 years ago

Does it work on CRAN version 0.3.9? devtools::install_version("dataverse", version = "0.3.9")

Danny-dK commented 2 years ago

Nope. Also tried with version 0.3.0 which it worked with last year. So I would assume something in dataverse itself changed that they may not accept empty json? the CURL script also doesn't seem to need json to be uploaded.

pdurbin commented 2 years ago

I assume that so far we're talking about if anything has changed in the dataverse package on CRAN.

Is there a chance that something changed on the server side? Have you been running the same version of Dataverse on the server this whole time? If not, do you know what version you were running and which version you upgraded to?

Danny-dK commented 2 years ago

That was my question as well as the 0.3.0 package also displays this issue (and as far as I can tell the code is not that much different in that specific aspect). It is happening both on demo.dataverse.nl (v5.6) and demo.dataverse.org (v5.10). To reproduce the problem at hand, the part in your readme / about section at https://github.com/IQSS/dataverse-client-r#data-archiving can be run. If you're not seeing the same thing happening, than it must be something different.

Danny-dK commented 2 years ago

Not sure if this has anything to do with it (not that great in R, sorry if it does not help). The code for add_dataset_file() (also shown above) uses:

bod2 <- list()
  if (!is.null(description)) {
    bod2$description <- description
  }
  jsondata <- as.character(jsonlite::toJSON(bod2, auto_unbox = TRUE))

If description remains empty, the jsondata object from jsondata <- as.character(jsonlite::toJSON(bod2, auto_unbox = TRUE)) results in '[]'. If I use that in the curl script, it fails to upload:

library(curl)
library(httr)

headers = c(
  `X-Dataverse-key` = 'xxxxxxxxxxxxxxxxxxxxx'
)

params = list(
  `persistentId` = 'doi:10.80227/test-PXCGQL',
  `version` = 'DRAFT'
)

files = list(
  `file` = upload_file('D:/part4.txt'),
  `jsonData` = '[]'
)

res <- httr::POST(url = 'https://demo.dataverse.nl/api/datasets/:persistentId/add', httr::add_headers(.headers=headers), 
                  query = params, body = files)

But if I use '{}' it uploads without problem. So it would seem dataverse repo does allow empty jsondata, but not as '[]'. Maybe that is the issue?

pdurbin commented 2 years ago

@Danny-dK hmm, if you can reproduce a bug with using [] in jsonData using command line curl, please open an issue at https://github.com/IQSS/dataverse/issues . You are welcome to test against https://demo.dataverse.org

Danny-dK commented 2 years ago

@pdurbin I did, but I was too hastily. I posted the R curl, but then started messing around with the curl in cmd prompt in Windows. There the curl command does allow empty json either in '[]' or '{}'. So now I'm thinking it is an R thing, maybe an R curl or httr. I'll try installing httr and curl from march 2021 and see if that is the issue there.

In cmd prompt it works:

curl -H X-Dataverse-key:xxxxxxxxxxxxxxxxxxxxxxxxxxxxx -X POST -F file=@"D:\part4.txt" -F 'jsonData="[]"' "https://demo.dataverse.org/api/datasets/:persistentId/add?persistentId=doi:10.70122/FK2/S0LXD9&version=DRAFT"
Danny-dK commented 2 years ago

I tried various versions and non seemed to work. Don't know why there is a difference. Just to reaffirm, you are seeing the same error when trying to run your own example at https://github.com/IQSS/dataverse-client-r#data-archiving ?

But this adjustment at 'add_dataset_file()' might fix it:

instead of: 1

bod2 <- list()
    if (!is.null(description)) {
        bod2$description <- description
    }

jsondata <- as.character(jsonlite::toJSON(bod2, auto_unbox = TRUE))

this might work: 2

bod2 <- NULL
    if (!is.null(description)) {
        bod2$description <- description
    }

jsondata <- as.character(jsonlite::toJSON(bod2, auto_unbox = TRUE))

For both the result when description <- 'test' is "{\"description\":\"test\"}". But when description is NULL then for the original code the result is '[]', while for the suggested change the result is '{}' which should upload without problem. I tested and works. I'll create a pull request (if I can figure out how).

Danny-dK commented 2 years ago

issue resolved in pull request (for now). (feel free to reopen if required)