IQSS / dataverse-client-r

R Client for Dataverse Repositories
https://iqss.github.io/dataverse-client-r
61 stars 24 forks source link

Downloading multiple files #46

Closed adam3smith closed 2 years ago

adam3smith commented 4 years ago

Please specify whether your issue is about:

I think this is just a question, but might also be enhancement/bug report.: The dataverse API allows downloading multipel files as .zip. This is particularly relevant now as it preserves the folder structure where available. There is code in the get_file() function that accesses this functionality, but I don't actually think it's ever possible to get there: I find no way of specifying multiple fileids

So first question:

  1. Am I right about this? Or could someone give me syntax to do this in get_file()?
  2. If I'm right that this isn't possible, what would be a good way to do this? Allow a vector of ids as input for the file parameter?
pdurbin commented 4 years ago

I don't mean to muddy the waters but there is a conversation going on about the :ZipDownloadLimit that comes into play here: https://groups.google.com/d/msg/dataverse-community/V1gExuDnm0A/nR4FIU1QBgAJ .Just something to be conscious of.

The Dataverse API absolutely does allow you to ask the Dataverse server to zip up a bunch of files by passing a comma-separated list of database IDs for files: http://guides.dataverse.org/en/4.18.1/api/dataaccess.html#multiple-file-bundle-download

You could also create the zip file client side, but this is more work (though easier on the server). You'd need to get the file hierarchy from the directoryLabel field in the metadata: https://dev2.dataverse.org/api/datasets/export?exporter=dataverse_json&persistentId=doi%3A10.5072/FK2/V8C0XO

adam3smith commented 4 years ago

Thanks @pdurbin -- yes, aware of the file zip limit discussion, but at least I'm using this with QDR where we have a more generous limit.

The Dataverse API absolutely does allow you to ask the Dataverse server to zip up a bunch of files by passing a comma-separated list of database IDs for files: http://guides.dataverse.org/en/4.18.1/api/dataaccess.html#multiple-file-bundle-download

Yes, that's what I was referring to and the linked code in get_file() actually implements that, it just never gets called (I think)

pdurbin commented 4 years ago

@adam3smith ah, I just clicked and I see what you mean:

    fileid <- paste0(fileid, collapse = ",")
    u <- paste0(api_url(server), "access/datafiles/", file)

Yes, that should do the trick, if it gets called. 😄

adam3smith commented 4 years ago

Ah got it -- this is possible in principle using a numeric vector (as one would expect), but there's a regression from https://github.com/IQSS/dataverse-client-r/commit/5ec375b21d5f9c5bb884f611d5aeb89c702da37b that missed one of the file --> fileid

I'll submit a PR with added documentation, test, and fix

kuriwaki commented 3 years ago

@adam3smith

Given that #47 "[does not] use the zip functionality of the API at all" and instead stores the each file content in a R list, does this mean we cannot implement a get_zip_* function that returns a zipped file (preferably one that keeps the nested directory structure)?

kuriwaki commented 2 years ago

The current functionality and examples (e.g. here in doc) should be enough for the immediate task for this issue.

Further considerations are to write a test for multi-file structures, and considering aget_zip_* function that wil return a zipfile (per https://github.com/IQSS/dataverse-client-r/issues/46#issuecomment-751581790). If that is a useful feature, please make a new Issue.