IDR / idr.openmicroscopy.org

Source for the IDR static website.
https://idr.openmicroscopy.org/about
Creative Commons Attribution 4.0 International
4 stars 16 forks source link

Unable to Download Data with Aspera #190

Open roshankern opened 9 months ago

roshankern commented 9 months ago

Hello,

Thanks for making this repo and data public. I have been working in the Way Lab for the past year to develop a tool for streaming IDR image data downloading and processing (IDR_stream). This tool was originally developed in 2022, and thus uses Aspera high-speed transfer client to download data from IDR at high speeds.

I attempted to run the command below within IDR_stream to download a video: sudo /home/roshankern/.aspera/ascli/sdk/ascp -TQ -l500m -P 33001 -i example_files/asperaweb_id_dsa.openssh idr0013@fasp.ebi.ac.uk:20150916-mitocheck-analysis/mitocheck/LT0001_02--ex2005_11_16--sp2005_02_17--tt17--c3/hdf5/00049_01.ch5 ../tmp/downloads/LT0001_02

This command had worked for me in the past but this time I got the following output: Session Stop (Error: Server aborted session: Permission denied)

The IDR download page and #189 both indicate that downloading IDR data with Aspera is no longer possible.

Thus, I have the following questions:

Thanks in advance! Roshan

sbesson commented 9 months ago

Hi @roshankern, thanks for opening this issue. You are correct the instructions has been updated recently and this conversation is a good opportunity to provide a bit more context.

EBI has been consolidating its data services and engaged with us in July 2023 on how this would affect the way data would be uploaded and downloaded for IDR. In the last few months, we have been working with them in the background to migrate the 500TB of IDR raw data and make them available through the Public Data Services - see here for more technical details. The old Aspera workflow using accession-based usernames has been decommissioned by EBI in December 2023 and is no longer functional.

  • Is it still possible to download IDR data with Aspera?

In short, yes. The IDR website has been updated to document the download workflow using anonymous FTP as we felt this was the easiest for most end-users. But EBI Public Data Services support multiple transfer protocols including FTP, Aspera & Globus.

  • If yes, what is the best way to download this data? Do I need to modify my command or redownload the Aspera public key (I couldn't find a new version online)?

The public key should be left unchanged but you will need two modifications to your command:

I modified the command you pasted above as in

/home/data/.aspera/connect/bin/ascp -TQ -l500m -P 33001 -i  /home/data/.aspera/connect/etc/asperaweb_id_dsa.openssh fasp-public@fasp.ebi.ac.uk:/pub/databases/IDR/idr0013-neumann-mitocheck/20150916-mitocheck-analysis/mitocheck/LT0001_02--ex2005_11_16--sp2005_02_17--tt17--c3/hdf5/00049_01.ch5  LT0001_02/

and was able to download the HDF file on my end.

  • If no, will FTP or IDR API be significantly slower for downloading image data? Is there any way to approach Aspera-like speeds with these other image downloading methods?

The IDR API cannot be used for downloading the raw data associated with a submission. On FTP vs Aspera, my understanding is that latter should technically offer higher download speed but in practice, this could depend on many parameters including firewall, latency, network topology. Since both FTP and Aspera protocols are now supported, I think you should be in a good position to benchmark both protocols and decide which one is most advantageous for your use case.

dominikl commented 7 months ago

We should probably add Aspera and Globus to the download instructions too. Aspera is tricky, it doesn't work for me with latest version, but the aspera-client-docker image works. Globus access would be via Globus personal connect app and URL https://app.globus.org/file-manager?origin_id=47772002-3e5b-4fd3-b97c-18cee38d6df2&origin_path=%2Fpub%2Fdatabases%2FIDR%2F .

mkcor commented 5 months ago

We should probably add Aspera and Globus to the download instructions too.

Yes! Sorry, I wrote https://github.com/IDR/idr.openmicroscopy.org/pull/193#issue-2258303892 before noticing this thread.

Aspera is tricky, it doesn't work for me with latest version, but the aspera-client-docker image works.

Good to know; I'll try on my end. Thank you.