DataverseNO / dataverse

Open source research data repository software
http://dataverse.org
Other
0 stars 0 forks source link

Support for retaining folder structure #41

Open philippconzett opened 2 years ago

philippconzett commented 2 years ago

One of the reasons why we want to migrate DataverseNO to the cloud is the possibility to configure S3 Direct Upload and Download. However, as stated in the Dataverse Developer Guide:

At present, one potential drawback for direct-upload is that files are only partially ‘ingested’, tabular and FITS files are processed, but zip files are not unzipped, and the file contents are not inspected to evaluate their mimetype.

The unzipping functionality is necessary to retain the folder structure. So the question is: How can we support both direct upload and download for large files, and retaining the folder structure? There seems to be several options:

  1. For datasets that are going to contain only a few folders/files: a) Upload files and add folder names through GUI.

  2. For datasets that are going to contain many folders/files: a) Upload files and specify folder names through API; or b) Upload through DVUploader; or c) Have two collections for each partner institution: i. One for large files >> S3 direct upload ii. One for small and medium size files >> S3 upload of zip files through Dataverse d) Are there any other GUI-based solutions?

I have asked this question in the Dataverse Google Group: https://groups.google.com/u/1/g/dataverse-community/c/U3G6H4-fNYk

philippconzett commented 2 years ago

Update: The board of DataverseNO has approved funding to hire GDCC to develop this functionality. We estimate this to be in place by the end of September 2022.