Open mzur opened 7 years ago
We noticed that some datasets in Pangaea may be easily used as a remote volume. Take this one for example. You can download a CSV with all image filenames and the volume URL. There is even location data for each image. Another example is this where the URL is also usable for a remote volume. These images are loaded from tape so first there is a redirect to a "please wait" page and then the download is initiated. This works automatically, too. When I request an image with cURL I get the HTML response first. If I wait a few seconds and then request the same URL again, I get the image file.
We can probably implement a dialog where users can create new volumes from Pangaea. They only have to insert the dataset URL and BIIGLE does the rest.
Make sure to import the DOI of the dataset as well (BiodataMiningGroup/biigle-volumes#38).
This would be a very good feature with high application potential!
I just sent my second message to the PANGAEA guys via their contact form. Hopefully they'll answer at some point.
The PANGAEA people finally answered. They said that they can't change the existing behavior of a returned code 503 and a periodic retry until an image is fetched from tape. If we want to make BIIGLE compatible with this, we would need to handle URLs from PANGAEA as a special case, both in the (video) annotation tool and in the file cache package.
Apparently PANGAEA is not interested in becoming a central image and video repository. Continued in #207.
We had another discussion with the people of PANGAEA. The possibility to receive a 503 response remains but it should be possible to make BIIGLE compatible at the following locations:
The annotation tool can show the loading animation and retry to fetch the image (based on response header timeout) for as long as there are 503 responses. The loading animation is only shown for the current image. Previous/next images are attempted only once and will show the loading animation again when the user switches to the image. This change has to be propagated to:
The FileCache
can also retry to fetch images based on the response header timeout. There needs to be an upper limit for the retry count/duration (ask PANGAEA staff?).
The create volume action also needs to handle possible 503 responses. As it uses the FileCache for the checks, it would hang for as long as there are 503 responses. Maybe it's sufficient to display a message like "Validation of your data may take a while" once the request takes more than 10 s to complete. However, this could run into the 30 s execution timeout. Maybe we should regard 503 responses as "the image exists" and just accept the volume?
We could provide a function to easily import volumes from Pangaea as remote volumes. These volumes can keep a reference to their source and all the metadata stored in Pangaea (as we don't want to store all that in Biigle ourselves).