cBioPortal / datahub

A centralized location for storing curated data from cBioPortal
172 stars 120 forks source link

Protein change of "MUTATED" #114

Closed jjgao closed 4 years ago

jjgao commented 6 years ago

There are a lot of mutations annotated as "MUTATED" in the public portal. Please try to address them? Do we have a validation rule for that?

select gp.stable_id, count(*) mut_count
from mutation_event me, mutation m, genetic_profile gp
where me.mutation_event_id=m.mutation_event_id
and m.genetic_profile_id=gp.genetic_profile_id
and protein_change = "MUTATED"
group by gp.stable_id
order by mut_count desc;
stable_id   mut_count
luad_broad_mutations    15523
cellline_nci60_mutations    5152
lgg_ucsf_2014_mutations 5081
meso_tcga_mutations 1882
prad_fhcrc_mutations    1135
brca_igr_2015_mutations 942
coadread_tcga_pub_mutations 505
kirc_tcga_pub_mutations 450
sclc_ucologne_2015_mutations    420
nsclc_tcga_broad_2016_mutations 354
es_dfarber_broad_2014_mutations 258
blca_dfarber_mskcc_2014_mutations   197
cellline_ccle_broad_mutations   193
brca_mbcproject_wagle_2017_mutations    192
esca_broad_mutations    179
gbc_shanghai_2014_mutations 136
skcm_tcga_mutations 135
ucec_tcga_mutations 114
brca_bccrc_xenograft_2014_mutations 113
stad_tcga_mutations 108
brca_tcga_pub_mutations 103
chol_nus_2012_mutations 99
luad_tcga_pub_mutations 84
lusc_tcga_pub_mutations 83
coadread_genentech_mutations    83
hnsc_tcga_pub_mutations 69
ccrcc_utokyo_2013_mutations 64
msk_impact_2017_mutations   62
hnsc_tcga_mutations 62
esca_tcga_mutations 53
cscc_dfarber_2015_mutations 47
desm_broad_2015_mutations   46
brca_tcga_mutations 45
sarc_mskcc_mutations    45
egc_tmucih_2015_mutations   44
lusc_tcga_mutations 40
paac_jhu_2014_mutations 34
cesc_tcga_mutations 30
brca_sanger_mutations   29
prad_tcga_mutations 27
kirc_tcga_mutations 25
blca_bgi_mutations  24
gbm_tcga_pub2013_mutations  22
coadread_tcga_mutations 22
mel_tsam_liang_2017_mutations   22
hcc_inserm_fr_2015_mutations    21
acc_tcga_mutations  20
kich_tcga_pub_mutations 20
hnsc_broad_mutations    17
sarc_tcga_mutations 16
lgggbm_tcga_pub_mutations   14
mm_broad_mutations  11
ov_tcga_mutations   11
nccrcc_genentech_2014_mutations 10
chol_tcga_mutations 8
lihc_tcga_mutations 8
escc_icgc_mutations 8
pcpg_tcga_mutations 8
kirp_tcga_mutations 7
cll_iuopa_2015_mutations    7
stad_tcga_pub_mutations 7
escc_ucla_2014_mutations    6
es_iocurie_2014_mutations   6
ampca_bcm_2016_mutations    6
brca_broad_mutations    6
egc_msk_2017_mutations  5
skcm_broad_mutations    5
prad_su2c_2015_mutations    5
mpnst_mskcc_mutations   5
hnsc_jhu_mutations  5
chol_nccs_2013_mutations    4
nbl_amc_2012_mutations  4
kich_tcga_mutations 4
thym_tcga_mutations 4
dlbc_tcga_mutations 3
lung_msk_2017_mutations 3
skcm_ucla_2016_mutations    3
brca_bccrc_mutations    3
ov_tcga_pub_mutations   3
prad_tcga_pub_mutations 3
brca_tcga_pub2015_mutations 3
stad_pfizer_uhongkong_mutations 2
prad_broad_mutations    2
hnsc_mdanderson_2013_mutations  2
gbm_tcga_mutations  2
acyc_sanger_2013_mutations  2
blca_tcga_mutations 2
lgg_tcga_mutations  2
ccrcc_irc_2014_mutations    2
brca_metabric_mutations 2
luad_tsp_mutations  1
tgct_tcga_mutations 1
thca_tcga_mutations 1
uvm_tcga_mutations  1
mbl_sickkids_2016_mutations 1
blca_tcga_pub_mutations 1
paad_icgc_mutations 1
ucs_tcga_mutations  1
liad_inserm_fr_2014_mutations   1
mbl_broad_2012_mutations    1
pact_jhu_2011_mutations 1
nsclc_unito_2016_mutations  1
n1zea144 commented 6 years ago

I thought we resolve these not too long ago - maybe it was just the gdac database? @yichaoS Check with @angelicaochoa I believe she has a tool that will annotate a variant and update the database table directly.

pieterlukasse commented 6 years ago

Still present in cbioportal.org. Example:

image

dionnezaal commented 6 years ago

With the current reannotated files in datahub/rc protein changes are available: _mesotcga, TP53

screen shot 2018-02-21 at 1 45 06 pm
jjgao commented 6 years ago

Here is the results on the latest public portal database by running the sql above. Somehow some studies have more issues now such as stad_tcga_pub

stad_tcga_pub_mutations 4190
meso_tcga_mutations 1882
brca_tcga_mutations 1008
luad_tcga_mutations 985
prad_fhcrc_mutations    981
stad_tcga_mutations 781
blca_tcga_mutations 575
sclc_ucologne_2015_mutations    420
nsclc_tcga_broad_2016_mutations 354
lusc_tcga_mutations 317
cesc_tcga_mutations 291
blca_dfarber_mskcc_2014_mutations   197
cellline_ccle_broad_mutations   193
stes_tcga_pub_mutations 153
skcm_tcga_mutations 135
ucec_tcga_mutations 114
ucs_tcga_mutations  109
chol_nus_2012_mutations 99
paad_qcmg_uq_2016_mutations 85
hnsc_tcga_mutations 62
esca_tcga_mutations 53
lusc_tcga_pub_mutations 50
sarc_mskcc_mutations    45
msk_impact_2017_mutations   40
thca_tcga_mutations 30
sarc_tcga_mutations 29
sclc_jhu_mutations  28
tet_nci_2014_mutations  27
prad_tcga_mutations 27
kirc_tcga_mutations 25
coadread_tcga_mutations 22
gbm_tcga_pub2013_mutations  22
mel_tsam_liang_2017_mutations   22
paad_tcga_mutations 20
acc_tcga_mutations  20
tgct_tcga_mutations 18
prad_su2c_2015_mutations    17
kirc_tcga_pub_mutations 14
hcc_inserm_fr_2015_mutations    14
ov_tcga_mutations   11
skcm_yale_mutations 11
prad_tcga_pub_mutations 10
cellline_nci60_mutations    9
pcpg_tcga_mutations 8
chol_tcga_mutations 8
lihc_tcga_mutations 8
kirp_tcga_mutations 7
cll_iuopa_2015_mutations    7
ampca_bcm_2016_mutations    6
stad_uhongkong_mutations    6
blca_bgi_mutations  6
lgg_ucsf_2014_mutations 5
egc_msk_2017_mutations  5
hnsc_broad_mutations    5
skcm_broad_mutations    5
brca_bccrc_xenograft_2014_mutations 5
mpnst_mskcc_mutations   5
kich_tcga_mutations 4
thym_tcga_mutations 4
crc_msk_2018_mutations  4
skcm_ucla_2016_mutations    3
hnsc_tcga_pub_mutations 3
luad_tcga_pub_mutations 3
paad_utsw_2015_mutations    3
brca_sanger_mutations   3
coadread_tcga_pub_mutations 3
brca_tcga_pub2015_mutations 3
lung_msk_2017_mutations 3
dlbc_tcga_mutations 3
hnsc_jhu_mutations  3
stad_utokyo_mutations   2
ov_tcga_pub_mutations   2
hnsc_mdanderson_2013_mutations  2
brca_metabric_mutations 2
brca_tcga_pub_mutations 2
esca_broad_mutations    2
acyc_sanger_2013_mutations  2
luad_broad_mutations    2
es_dfarber_broad_2014_mutations 2
prad_broad_mutations    2
lgg_tcga_mutations  2
summit_2018_mutations   2
panet_jhu_2011_mutations    1
stad_pfizer_uhongkong_mutations 1
uvm_tcga_mutations  1
thca_tcga_pub_mutations 1
pact_jhu_2011_mutations 1
ccrcc_irc_2014_mutations    1
blca_tcga_pub_mutations 1
utuc_mskcc_2013_mutations   1
lgggbm_tcga_pub_mutations   1
cscc_dfarber_2015_mutations 1
mbl_sickkids_2016_mutations 1
luad_tsp_mutations  1
paad_icgc_mutations 1
escc_icgc_mutations 1
liad_inserm_fr_2014_mutations   1
ccrcc_utokyo_2013_mutations 1
ucs_jhu_2014_mutations  1
mm_broad_mutations  1
jjgao commented 6 years ago

Here is an update. @ritikakundra @yichaoS can you look into the first a few and see if they can be fixed? cc'ing @n1zea144

stad_tcga_pub_mutations 4190
sclc_cancercell_gardner_2017_mutations  3284
prad_fhcrc_mutations    981
prad_p1000_mutations    708
cll_iuopa_2015_mutations    516
blca_dfarber_mskcc_2014_mutations   471
sclc_ucologne_2015_mutations    420
nsclc_tcga_broad_2016_mutations 354
stes_tcga_pub_mutations 153
cellline_ccle_broad_mutations   145
chol_nus_2012_mutations 99
paad_qcmg_uq_2016_mutations 85
brca_bccrc_xenograft_2014_mutations 80
lusc_tcga_pub_mutations 50
sarc_mskcc_mutations    45
msk_impact_2017_mutations   40
sclc_jhu_mutations  28
tet_nci_2014_mutations  27
coad_tcga_pan_can_atlas_2018_mutations  26
ucec_tcga_pan_can_atlas_2018_mutations  24
ov_tcga_pan_can_atlas_2018_mutations    23
mel_tsam_liang_2017_mutations   22
gbm_tcga_pub2013_mutations  22
angs_project_painter_2018_mutations 19
prad_su2c_2015_mutations    17
stad_tcga_pan_can_atlas_2018_mutations  14
kirc_tcga_pub_mutations 14
hcc_inserm_fr_2015_mutations    14
brca_tcga_pan_can_atlas_2018_mutations  13
skcm_yale_mutations 11
prad_tcga_pub_mutations 10
cellline_nci60_mutations    9
gbm_tcga_pan_can_atlas_2018_mutations   9
ov_tcga_mutations   8
cesc_tcga_pan_can_atlas_2018_mutations  7
stad_uhongkong_mutations    6
ampca_bcm_2016_mutations    6
cesc_tcga_mutations 6
luad_tcga_pan_can_atlas_2018_mutations  6
blca_bgi_mutations  6
blca_tcga_pan_can_atlas_2018_mutations  6
mpnst_mskcc_mutations   5
lgg_ucsf_2014_mutations 5
hnsc_broad_mutations    5
skcm_broad_mutations    5
crc_msk_2018_mutations  4
coadread_tcga_mutations 4
hnsc_tcga_pub_mutations 3
brca_sanger_mutations   3
dlbc_tcga_pan_can_atlas_2018_mutations  3
read_tcga_pan_can_atlas_2018_mutations  3
lihc_tcga_pan_can_atlas_2018_mutations  3
luad_tcga_pub_mutations 3
hnsc_jhu_mutations  3
lung_msk_2017_mutations 3
paad_utsw_2015_mutations    3
lusc_tcga_pan_can_atlas_2018_mutations  3
skcm_ucla_2016_mutations    3
blca_tcga_pub_2017_mutations    3
hnsc_tcga_pan_can_atlas_2018_mutations  3
prad_broad_mutations    2
brca_metabric_mutations 2
stad_utokyo_mutations   2
brca_tcga_pub_mutations 2
esca_broad_mutations    2
esca_tcga_pan_can_atlas_2018_mutations  2
summit_2018_mutations   2
es_dfarber_broad_2014_mutations 2
acyc_sanger_2013_mutations  2
luad_broad_mutations    2
ov_tcga_pub_mutations   2
paad_tcga_pan_can_atlas_2018_mutations  2
hnsc_mdanderson_2013_mutations  2
panet_jhu_2011_mutations    1
blca_tcga_pub_mutations 1
mbl_sickkids_2016_mutations 1
stad_pfizer_uhongkong_mutations 1
kirc_tcga_mutations 1
prad_eururol_2017_mutations 1
cscc_dfarber_2015_mutations 1
kirc_tcga_pan_can_atlas_2018_mutations  1
mm_broad_mutations  1
dlbc_tcga_mutations 1
kirp_tcga_pan_can_atlas_2018_mutations  1
lgggbm_tcga_pub_mutations   1
nsclc_pd1_msk_2018_mutations    1
prad_tcga_pan_can_atlas_2018_mutations  1
ccrcc_irc_2014_mutations    1
lgg_tcga_pan_can_atlas_2018_mutations   1
ccrcc_utokyo_2013_mutations 1
escc_icgc_mutations 1
liad_inserm_fr_2014_mutations   1
thca_tcga_pub_mutations 1
paad_icgc_mutations 1
ucs_jhu_2014_mutations  1
utuc_mskcc_2013_mutations   1
luad_tsp_mutations  1
chol_tcga_mutations 1
skcm_tcga_pan_can_atlas_2018_mutations  1
pact_jhu_2011_mutations 1
jjgao commented 5 years ago

Here is an updated list.

mixed_allen_2018_mutations  2527
mixed_pipseq_2017_mutations 1776
prad_fhcrc_mutations    981
prad_p1000_mutations    708
cll_iuopa_2015_mutations    516
dlbc_broad_2012_mutations   495
blca_dfarber_mskcc_2014_mutations   471
sclc_ucologne_2015_mutations    420
mcl_idibips_2013_mutations  290
brca_mbcproject_wagle_2017_mutations    277
stes_tcga_pub_mutations 153
cellline_ccle_broad_mutations   145
nsclc_tcga_broad_2016_mutations 144
angs_project_painter_2018_mutations 142
paad_qcmg_uq_2016_mutations 102
chol_nus_2012_mutations 93
brca_bccrc_xenograft_2014_mutations 80
sclc_cancercell_gardner_2017_mutations  72
lusc_tcga_pub_mutations 50
sarc_mskcc_mutations    45
pediatric_dkfz_2017_mutations   44
cellline_nci60_mutations    39
paad_utsw_2015_mutations    38
ucec_tcga_pub_mutations 35
coadread_tcga_pan_can_atlas_2018_mutations  29
sclc_jhu_mutations  28
ucec_tcga_pan_can_atlas_2018_mutations  24
ov_tcga_pan_can_atlas_2018_mutations    23
gbm_tcga_pub2013_mutations  22
mel_tsam_liang_2017_mutations   22
hnsc_broad_mutations    17
ctcl_columbia_2015_mutations    17
kirc_tcga_pub_mutations 14
stad_tcga_pan_can_atlas_2018_mutations  14
brca_tcga_pan_can_atlas_2018_mutations  13
sarc_tcga_pub_mutations 12
mm_broad_mutations  11
skcm_yale_mutations 11
prad_tcga_pub_mutations 10
gbm_tcga_pan_can_atlas_2018_mutations   9
prad_su2c_2015_mutations    9
ov_tcga_mutations   8
cesc_tcga_pan_can_atlas_2018_mutations  7
hcc_inserm_fr_2015_mutations    7
prad_broad_2013_mutations   7
luad_tcga_pan_can_atlas_2018_mutations  6
ampca_bcm_2016_mutations    6
blca_bgi_mutations  6
blca_tcga_pan_can_atlas_2018_mutations  6
stad_uhongkong_mutations    6
cesc_tcga_mutations 6
skcm_broad_mutations    5
mpnst_mskcc_mutations   5
lgg_ucsf_2014_mutations 5
nhl_bcgsc_2013_mutations    5
uccc_nih_2017_mutations 5
coadread_tcga_mutations 4
crc_msk_2017_mutations  4
lusc_tcga_pan_can_atlas_2018_mutations  3
hnsc_tcga_pan_can_atlas_2018_mutations  3
blca_tcga_pub_2017_mutations    3
skcm_ucla_2016_mutations    3
brca_sanger_mutations   3
breast_msk_2018_mutations   3
dlbc_tcga_pan_can_atlas_2018_mutations  3
lihc_tcga_pan_can_atlas_2018_mutations  3
lung_msk_2017_mutations 2
ov_tcga_pub_mutations   2
paad_tcga_pan_can_atlas_2018_mutations  2
prad_broad_mutations    2
esca_broad_mutations    2
stad_utokyo_mutations   2
esca_tcga_pan_can_atlas_2018_mutations  2
es_dfarber_broad_2014_mutations 2
luad_broad_mutations    2
ucs_jhu_2014_mutations  1
chol_tcga_mutations 1
hnsc_jhu_mutations  1
coadread_dfci_2016_mutations    1
pact_jhu_2011_mutations 1
kirc_tcga_mutations 1
panet_jhu_2011_mutations    1
skcm_tcga_pan_can_atlas_2018_mutations  1
blca_tcga_pub_mutations 1
kirc_tcga_pan_can_atlas_2018_mutations  1
kirp_tcga_pan_can_atlas_2018_mutations  1
stad_pfizer_uhongkong_mutations 1
lgggbm_tcga_pub_mutations   1
mrt_bcgsc_2016_mutations    1
prad_eururol_2017_mutations 1
dlbc_tcga_mutations 1
lgg_tcga_pan_can_atlas_2018_mutations   1
nbl_broad_2013_mutations    1
nhl_bcgsc_2011_mutations    1
ccrcc_utokyo_2013_mutations 1
liad_inserm_fr_2014_mutations   1
escc_icgc_mutations 1
prad_tcga_pan_can_atlas_2018_mutations  1
all_stjude_2013_mutations   1
ucec_msk_2018_mutations 1
yichaoS commented 4 years ago

Updated result Jan 8: ccle_broad_2019_mutations 1310 prad_mskcc_cheny1_organoids_2014_mutations 707 prad_fhcrc_mutations 367 mcl_idibips_2013_mutations 290 es_dfarber_broad_2014_mutations 256 ihch_smmu_2014_mutations 172 chol_nus_2012_mutations 93 luad_broad_mutations 79 sclc_cancercell_gardner_2017_mutations 72 prad_mpcproject_2018_mutations 64 acc_2019_mutations 60 pediatric_dkfz_2017_mutations 47 cll_iuopa_2015_mutations 39 coadread_tcga_pan_can_atlas_2018_mutations 28 ucec_tcga_pan_can_atlas_2018_mutations 24 ov_tcga_pan_can_atlas_2018_mutations 23 nsclc_tracerx_2017_mutations 17 ctcl_columbia_2015_mutations 17 lusc_tcga_pub_mutations 17 cellline_ccle_broad_mutations 15 stad_tcga_pan_can_atlas_2018_mutations 14 stes_tcga_pub_mutations 14 brca_tcga_pan_can_atlas_2018_mutations 13 prad_tcga_pub_mutations 11 gbm_tcga_pan_can_atlas_2018_mutations 9 ov_tcga_mutations 8 cesc_tcga_pan_can_atlas_2018_mutations 7 hcc_inserm_fr_2015_mutations 7 blca_tcga_pan_can_atlas_2018_mutations 6 luad_tcga_pan_can_atlas_2018_mutations 6 cesc_tcga_mutations 6 nhl_bcgsc_2013_mutations 5 mnm_washu_2016_mutations 4 coadread_tcga_mutations 4 brca_igr_2015_mutations 4 cll_broad_2015_mutations 3 sclc_ucologne_2015_mutations 3 hnsc_tcga_pan_can_atlas_2018_mutations 3 blca_bgi_mutations 3 brca_sanger_mutations 3 lihc_tcga_pan_can_atlas_2018_mutations 3 lusc_tcga_pan_can_atlas_2018_mutations 3 histiocytosis_cobi_msk_2019_mutations 2 skcm_broad_mutations 2 paad_tcga_pan_can_atlas_2018_mutations 2 dlbc_tcga_pan_can_atlas_2018_mutations 2 esca_tcga_pan_can_atlas_2018_mutations 2 gbm_mayo_pdx_sarkaria_2019_mutations 2 prad_p1000_mutations 1 metastatic_solid_tumors_mich_2017_mutations 1 prad_tcga_pan_can_atlas_2018_mutations 1 mixed_pipseq_2017_mutations 1 chol_tcga_mutations 1 hnsc_jhu_mutations 1 nhl_bcgsc_2011_mutations 1 coadread_dfci_2016_mutations 1 nsclc_tcga_broad_2016_mutations 1 skcm_tcga_pan_can_atlas_2018_mutations 1 kirc_tcga_mutations 1 bcc_unige_2016_mutations 1 kirc_tcga_pan_can_atlas_2018_mutations 1 stad_uhongkong_mutations 1 crc_msk_2017_mutations 1 kirc_tcga_pub_mutations 1 kirp_tcga_pan_can_atlas_2018_mutations 1 dlbc_tcga_mutations 1 lgg_tcga_pan_can_atlas_2018_mutations 1 pact_jhu_2011_mutations 1 esca_broad_mutations 1

yichaoS commented 4 years ago

Query Result: Feb 26

jjgao commented 4 years ago

@yichaoS thanks! I am wondering you have set up some process to look for this issue in the future? e.g. querying db after import, or check the annotated MAF.

yichaoS commented 4 years ago

Progress can be tracked here: https://docs.google.com/spreadsheets/d/1RFj8zzMPv4PCAs3sFM9ro3FP4vRg-Za-JI2AFlx6FHg/edit#gid=0

@jjgao We do have this process included in our standard curation process: querying db after import, or check the annotated MAF, so try to prevent this happen again. If an allele was missing, or a coordinate doesn't match nucleotide, we would have to reach out to the data source, which may leave the variants marked as mutated for a long period of time.

What should we do if this happen again? Should we temporarily remove the variants? Or just leave them as is? Or using another way to display them?

jjgao commented 4 years ago

@yichaoS I would vote to remove them as they are only a very small fraction of our data now.