Open gsaksena opened 7 years ago
Note that all of the current regression tests pass in spite of this error being thrown. I'm not sure whether this is or is not desirable.
This was by choice and design, we are currently only using mutect mafs.
These are warnings, not errors. Data types that are not specified are not diced, and a warning is given for them in the log.
In the current branch code, a file dicing can be labeled pass, error, cached, and dry_run. And, missing datatypes are being flagged as error. I've also added the new datatypes in this commit:
https://github.com/broadinstitute/gdctools/pull/67/commits/613c77c311f9484eb0bc9d09b9d8b7d95e4c9f0c
It sounds like you would suggest backing out this commit, and adding a new file dicing status of 'unrecognized'.
There should be new datatypes for somaticsniper, muse, and varscan maf files.
The following commands:
gdc_mirror --config tcgaSmoketest.cfg gdc_dice --config tcgaSmoketest.cfg
yield the following output: ... 2017-10-28 19:31:23,568[INFO]: Writing sample MAF: TCGA-D3-A3C7-06A-11D-A196-08.7d62b913-4ae8-4c59-9819-71711f12b3b2.maf.txt 2017-10-28 19:31:23,591[INFO]: Writing sample MAF: TCGA-EE-A3J8-06A-11D-A20D-08.7d62b913-4ae8-4c59-9819-71711f12b3b2.maf.txt 2017-10-28 19:31:23,616[WARNING]: Unrecognized data: { "file_name": "TCGA.SKCM.somaticsniper.8cce7734-539b-4fba-bf9a-69735906d962.DR-7.0.somatic.maf.gz", "data_category": "Simple Nucleotide Variation", "data_type": "Masked Somatic Mutation", "file_id": "8cce7734-539b-4fba-bf9a-69735906d962" } 2017-10-28 19:31:23,616[WARNING]: Unrecognized data: { "file_name": "TCGA.SKCM.muse.a1fe3943-5377-4763-8494-5e4e61545820.DR-7.0.somatic.maf.gz", "data_category": "Simple Nucleotide Variation", "data_type": "Masked Somatic Mutation", "file_id": "a1fe3943-5377-4763-8494-5e4e61545820" } 2017-10-28 19:31:23,617[WARNING]: Unrecognized data: { "file_name": "TCGA.SKCM.varscan.e751c317-d661-4290-b755-2b5c4d9cd0a4.DR-7.0.somatic.maf.gz", "data_category": "Simple Nucleotide Variation", "data_type": "Masked Somatic Mutation", "file_id": "e751c317-d661-4290-b755-2b5c4d9cd0a4" } ...