cBioPortal / datahub

A centralized location for storing curated data from cBioPortal
175 stars 120 forks source link

brca_tcga.tar.gz what should be the datatype for protein expression? #33

Closed marianc000 closed 7 years ago

marianc000 commented 7 years ago

Hello I am trying to upload brca_tcga.tar.gz to the latest version of cbioportal. During the validation, files meta_protein_quantification.txt and meta_protein_quantification_Zscores.txt produce errors saying the datatype is incorrect. In http://cbioportal.readthedocs.io/en/latest/File-Formats.html I found that protein expression can be only rppa. But files meta_rppa.txt and meta_rppa_Zscores.txt already exist among brca_tcga.tar.gz files. How to import meta_protein_quantification.txt and meta_protein_quantification_Zscores.txt? Is there any other data type for protein expression? Best regards, Marian

sandertan commented 7 years ago

Hi @marianc000 Yes this is a new datatype, which is mass spectrometry data from CPTAC. We just added the data files, and the code to validate it is not in cbioportal yet. There's a PR for it (https://github.com/cBioPortal/cbioportal/pull/2310) so hopefully it will be in soon.

In the meanwhile, you can validate and load this study if you remove these protein_quantification files, or you can import without validating by running cbioportalImporter.py

sandertan commented 7 years ago

hi @marianc000 , with the latest cBioPortal (1.6.0) it should be possible to load these data.

Also the FileFormats-page has been updated: http://cbioportal.readthedocs.io/en/latest/File-Formats.html#protein-level-data