broadinstitute / gdctools

Python and UNIX CLI utilities to simplify interaction with the NIH/NCI Genomics Data Commons
Other
31 stars 4 forks source link

New types of SNV data at GDC: needs scrutiny #45

Closed noblem closed 7 years ago

noblem commented 7 years ago

The GDC now distinguishes between "raw" and "annotated" mutation data in the corpus of data it serves, and we need to make sure that we're (A) processing all of them properly and (B) choosing the best ones to funnel downstream for analyses.

screen shot 2017-06-21 at 12 34 51 pm

dheiman commented 7 years ago

This is not new, they've simply changed the names. Annotated simply means they've been processed with VEP, a necessary step before generating MAFs. Look closely at the file counts, and you'll see that those are the sample level VCFs, not MAFs.

noblem commented 7 years ago

Hmm, why are you answering emails on vacation? Anyway ... OK, so (A) and (B) are effectively already being met, nice.