Sage-Bionetworks / cleanAD

Tools for cleaning and organizing study data for the AD Knowledge Portal.
Other
0 stars 1 forks source link

Archive WGS Data Transfer bam files #13

Open avanlinden opened 3 years ago

avanlinden commented 3 years ago

The WGS Harmonization fastq files are in a STRIDES S3 bucket and have been surfaced as part of the WGS Harmonization study in the Portal, with no download restrictions other than ToU and signed DUC. However, the original GRCH37 bam files from those reads are still in the separate AMP WGS Data Transfer project, and have not been moved to a STRIDES bucket. Because the extra data transfer clickwrap preventing download of these files was removed from the project in order to move the fastqs, I have temporarily made the entire project view-only for Synapse users, so we don't end up with surprise random egress charges for the bams while we decide what to do.

There are 3,780 bam files left in this project, with a total file size of 179 TB. Our options are:

When we make a decision, we can open a Jira ticket with Platforms to help us move forward. @annagree @amapeters

avanlinden commented 2 years ago

Bams are available for download again (some contributors were still using them). The cloud-only download restriction is in place on the S3 bucket, so the Synapse permissions are irrelevant.

avanlinden commented 2 years ago

To check:

avanlinden commented 2 years ago

Verified with Gautam that they are still in the process of downloading the BAM files. They will notify me when they are done.

avanlinden commented 2 years ago

Info from Bruce on Glacier Deep Archive: https://sagebionetworks.jira.com/browse/IT-1505.

Projected cost is $177/month to maintain this bucket in Glacier Deep Archive.