ebi-ait / dcp-ingest-central

Central point of access for the Ingestion Service of the HCA DCP
Apache License 2.0
0 stars 0 forks source link

Investigate how many projects have an exact match ENA/HCA filenames #1031

Closed ESapenaVentura closed 1 month ago

ESapenaVentura commented 2 months ago

Background: HCA download costs from the HCA Data Portal are rising. We need to find a way to lower costs. One option would be to offer download from the EBI archives for datasets whose files are available there.

Current findings:

Questions:

ESapenaVentura commented 1 month ago

Investigating the third question. I have written a script with certain optimisations:

Even so, it estimates around 88 hours (~4 days) of uninterrupted running to finish.

If you can think of any other optimisation/assumption let me know!

ESapenaVentura commented 1 month ago

There is a raw report with supporting files here

We're going to meet with tony to discuss about it. Until then, I'm not performing additional analysis

gabsie commented 1 month ago

Awesome work, @ESapenaVentura We have enough for the conversation with JR this week, and we will take it from there.