Closed jjacobson95 closed 1 month ago
Just a note, this error is more common than previously thought, occurring in almost every run. There are a couple variations of the error that are appearing including the following:
bb81-8abe098d889f: 410 Client Error: Gone for url: https://api.gdc.cancer.gov/legacy/data?compress
This error indicates that the particular file is permanently not available to be downloaded or has been removed, however the error is inconsistent and appears for different (or none) files each time the gdc tool is run.
This will be resolved in the build_all_updates branch. Still doing some tests, but it seems to be working.
Output of Fixed code: logs are printing out of order, but you can tell what is happening in here.
...
100% [############################################] Time: 0:00:02 1.1 MiB/s
100% [############################################] Time: 0:00:02 1.3 MiB/s
Successfully downloaded: 1137
Failed downloads: 2
100% [############################################] Time: 0:00:02 1.2 MiB/s
100% [############################################] Time: 0:00:04 794.0 KiB/s
Successfully downloaded: 2
gdc-client already installed
Using provided manifest and downloading data...
Using gdc tool and retrieving get metadata...
Total files to download: 1139
Starting initial download...
Initial download complete.
Retrying download for 2 files (Attempt 1/5):
Missing files: 2
File IDs: 70efab6a-c0d3-403a-8708-880136723d1f, 4b362fa9-4031-4404-8522-cf19308dea49
Starting retry 1 download...
Retry 1 complete.
All files downloaded and verified successfully.
All files downloaded and verified successfully after retries.
Extracting UUIDs from manifest...
When the manifest files are downloaded, it is possible that some files fail to download and are ultimately excluded from the build process. We need to make this process more robust so it either fails and exits when this happens, or better, it re-runs the failed files until they download correctly.
This possibility is present in all HCMI omics builds.