dcppc / data-stewards

Questions and answers about TOPmed, GTEx, and AGR resources.
8 stars 0 forks source link

Differences between dbGaP, S3 Cross-account method and fusera #22

Closed gversmee closed 6 years ago

gversmee commented 6 years ago

Hi all,

I have tested all the different methods in order to get access to the vcf files for the study COPDgenes (phs000951). Today, my favorite is fusera. It's easy and very fast. However, there are some differences in the files that we can access using the 3 methods.

I would like to get access to the whole 9135 from freeze 5 using fusera. How can I get the 7249 SRR# that are missing?

Best,

Greg

bheavner commented 6 years ago

Hi Greg -

I'm not sure what the "cross-account" method is, nor why there would be 9135 samples that way.

There are 8742 samples for COPDGene in freeze 5.

NIH Data Commons developers should only be able to access data for samples that have been released on dbGaP. The current release (v1) for COPDgene is 1886 samples from freeze 4. For various reasons we won’t go into here, NIH DC developers are getting access to single-sample VCF files from freeze 5, but they are still limited to the samples that have been released on dbGaP (hence N=1886). All of freeze 5 samples may be in the cloud buckets, but fusera is (appropriately) only allowing access to those that are also in freeze 4. We are expecting release of freeze 5 for COPDGene soon and, when that happens (and when fusera ‘knows’ about it), you should be able to access the full set.

... At least, I think that's right, but it may be that a member of the TOPMed IRC will correct me on some point - if I hear otherwise, I'll update this issue.

gversmee commented 6 years ago

Thank you for the clarification. What I called the "cross-account" method is described here https://docs.google.com/document/d/1z_WZFmc2mqPqF5hAadB1a3Nhw4dn9-c47XxIjR5-SDw/edit. With that method, we received a list of the sample that should be available to us, and for that specific study, there was ~9000 samples. So for now, we will stick with what is viewable on dbGaP and fusera.