Closed selkamand closed 11 months ago
Current version of CRUX works fine with GDC, even using gzipped MAFs. Tested with:
9b5db564-ed9b-484f-9570-9fc9e5081aec.wxs.aliquot_ensemble_masked.maf.gz
Note that when you download a GDC MAF you get also get an 'annotation' file, which doesn't include any clinical annotations but rather just describes some of technical mutation file metadata. I suspect that the confusion is caused because CRUX throws an error if you try to load this document as clinical annotations. The error correctly tells you there is no Tumor_Sample_Barcode and that the annotation file is incorrect
Example of 'annotations' file auto-downloaded with MAF below
id submitter_id entity_type entity_id category classification created_datetime status notes
71fcf395-a102-4585-818c-3ac68ba89f02 \N masked_somatic_mutation 6bcdf6af-8a83-4673-895c-6c6aaa0fc1e7 General Notification 2022-02-28T15:36:56.248355-06:00 Approved Variants from SomaticSniper are not included.
I think the solution to this is a small comment in the manual describing where to get the real clinical data when downloading MAFs from the GDC.
Added a comment in the FAQ of the CRUX manual
The import of GDC data into CRUX most commonly fail due to confusion caused by the 'annotations.txt' file automatically downloaded alongside MAF files.
annotations.txt is NOT a valid clinical annotations file, it only contains file-level metadata.
If your upload fails, even when you try to upload the GDC MAF files alone,
please screenshot the error message and contact us
_ so we can resolve the issue.
MAF files from GDC fail import (format error)