CCICB / CRUX

Other
2 stars 1 forks source link

GDC Maf files fail to import #98

Closed selkamand closed 11 months ago

selkamand commented 1 year ago

MAF files from GDC fail import (format error)

selkamand commented 12 months ago

Current version of CRUX works fine with GDC, even using gzipped MAFs. Tested with:

9b5db564-ed9b-484f-9570-9fc9e5081aec.wxs.aliquot_ensemble_masked.maf.gz

Note that when you download a GDC MAF you get also get an 'annotation' file, which doesn't include any clinical annotations but rather just describes some of technical mutation file metadata. I suspect that the confusion is caused because CRUX throws an error if you try to load this document as clinical annotations. The error correctly tells you there is no Tumor_Sample_Barcode and that the annotation file is incorrect

Example of 'annotations' file auto-downloaded with MAF below

id  submitter_id    entity_type entity_id   category    classification  created_datetime    status  notes
71fcf395-a102-4585-818c-3ac68ba89f02    \N  masked_somatic_mutation 6bcdf6af-8a83-4673-895c-6c6aaa0fc1e7    General Notification    2022-02-28T15:36:56.248355-06:00    Approved    Variants from SomaticSniper are not included.

I think the solution to this is a small comment in the manual describing where to get the real clinical data when downloading MAFs from the GDC.

selkamand commented 11 months ago

Added a comment in the FAQ of the CRUX manual


Why can I not import GDC MAF / annotation files?

The import of GDC data into CRUX most commonly fail due to confusion caused by the 'annotations.txt' file automatically downloaded alongside MAF files.

annotations.txt is NOT a valid clinical annotations file, it only contains file-level metadata.

If your upload fails, even when you try to upload the GDC MAF files alone, please screenshot the error message and contact us_ so we can resolve the issue.