cBioPortal / cbioportal

cBioPortal for Cancer Genomics
https://cbioportal.org
GNU Affero General Public License v3.0
636 stars 487 forks source link

RPPA data import PROTEIN_ARRAY_PROTEIN_LEVEL vs PROTEIN_LEVEL #730

Closed aderidder closed 4 years ago

aderidder commented 8 years ago

Hi,

When looking at e.g. Kidney Renal Clear Cell Carcinoma (TCGA, Nature 2013), the 'Select Genomic Profiles' shows a checkbox for Protein expression Z-scores (RPPA). For Breast Invasive Carcinoma (TCGA, Provisional), the 'Select Genomic Profiles' does not show this checkbox even though, according to the Data Sets page, both have RPPA data.

I was trying to import some of the data myself, and I think the issue is caused by the possibility to upload RPPA data via ImportProteinData as well as via ImportProfileData. If you import data via ImportProteinData, the Genetic Alteration Type becomes "PROTEIN_ARRAY_PROTEIN_LEVEL", whereas if you import it via ImportProfileData it becomes "PROTEIN_LEVEL". The first is not added to the 'Select Genomic Profiles'; the second is (dynamicQuery.js). This may also be the case for e.g. the plots tab, which has a protein sub-tab (still have to test that).

A couple of questions arise:

  1. Is there a difference between PROTEIN_ARRAY_PROTEIN_LEVEL and PROTEIN_LEVEL?
  2. If there is no difference: is one of them deprecated?
  3. Should the cbioportal database genetic_profiles data be updated? Maybe every entry 'PROTEIN-ARRAY_PROTEIN_LEVEL should be updated to PROTEIN_LEVEL?

Thanks, Sander

aderidder commented 8 years ago

It seems the plots tab also looks for a non-zscore version? So I'm guessing the data has to be imported twice, once z-score normalized and once with the non-zscore version?

aderidder commented 8 years ago

I managed to get the RPPA data for the brca provisional (hopefully properly) into my cbioportal instance. Here's what I did:

  1. create a z-score normalized version of the rppa data via R
  2. update the meta files and give them genetic_alteration_type: PROTEIN_LEVEL
  3. added to cbioportal_common.py, "PROTEIN_LEVEL" : "org.mskcc.cbio.portal.scripts.ImportProfileData"

I imported both the z-score normalized and the normal version and can now select rppa in 'Select Genomic Profiles' and protein works properly in both the plots and the enrichments tabs.

@jjgao would love to hear your opinion on this. Are we correctly assuming the ImportProteinData is deprecated and should perhaps be removed from the source code?

jjgao commented 8 years ago

@aderidder you are right. PROTEIN_ARRAY_PROTEIN_LEVEL is deprecated. We are in the process of removing it with ImportProteinData script.

jjgao commented 8 years ago

It is related to issue https://github.com/cBioPortal/cbioportal/issues/377

aderidder commented 8 years ago

Ok thanks!

pieterlukasse commented 8 years ago

@jjgao , @aderidder : I'm assuming this means the protein_array* tables can be deleted from our DB model (I'm making a new PDF of the DB model for internal use). Correct?

jjgao commented 8 years ago

@pieterlukasse there are still some code dependency on the protein_array* tables, e.g. in web api. We will ultimately remove them. It should be safe to remove them in your PDF of the DB model.

pieterlukasse commented 8 years ago

great!

Sjoerd-van-Hagen commented 4 years ago

@jjgao not sure about the status of this one. Can you decide whether this can be closed?

jjgao commented 4 years ago

@Sjoerd-van-Hagen let's keep it open but I have removed the hyve label.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.