datadryad / dryad-product-roadmap

Repository of issues for Dryad project boards
https://github.com/orgs/datadryad/projects
8 stars 0 forks source link

Handle files uploaded via URL in new storage #3126

Closed ahamelers closed 6 months ago

ahamelers commented 7 months ago

From https://github.com/CDL-Dryad/dryad-app/pull/1498

Files that are "uploaded" via URL need to be loaded to the new storage

ryscher commented 7 months ago

Currently, for files that have been uploaded to the temporary storage, SubmissionJob.do_submit! handles the process of moving them to the permanent storage. I assume we can handle the URL-based files in the same location, since each SubmissionJob is already in its own thread.

We know that a file is URL-based instead of a direct upload in two ways:

  1. There is no corresponding file in the temporary storage
  2. The file object (f) contains a url.
ryscher commented 6 months ago

Here are some sample files I tested with, which may be useful in future testing....

Simple images:

Dataset:

Dryad file (with no file extension), currently fails due to #2932:

Larger (100MB) dataset:

Large Friendster archive (9GB):

Image with no file extension in URL: