cBioPortal / rfc80-team

repository to hold issues for the rfc80 development / deployment team
0 stars 0 forks source link

Overall survival chart is missing NA counts in CH implementation #39

Closed alisman closed 1 month ago

alisman commented 2 months ago

https://genie-public-beta.cbioportal.org/study/summary?id=brca_akt1_genie_2019%2Claml_tcga_pan_can_atlas_2018%2Cacc_tcga_pan_can_atlas_2018%2Cblca_tcga_pan_can_atlas_2018%2Clgg_tcga_pan_can_atlas_2018%2Cbrca_tcga_pan_can_atlas_2018%2Ccesc_tcga_pan_can_atlas_2018%2Cchol_tcga_pan_can_atlas_2018%2Ccoadread_tcga_pan_can_atlas_2018%2Cglioma_dfci_2020%2Cdlbc_tcga_pan_can_atlas_2018%2Cerbb2_genie_public%2Cesca_tcga_pan_can_atlas_2018%2Ccrc_public_genie_bpc%2Cnsclc_public_genie_bpc%2Cgenie_public%2Cgbm_tcga_pan_can_atlas_2018%2Chnsc_tcga_pan_can_atlas_2018%2Ckich_tcga_pan_can_atlas_2018%2Ckirc_tcga_pan_can_atlas_2018%2Ckirp_tcga_pan_can_atlas_2018%2Clihc_tcga_pan_can_atlas_2018%2Cluad_tcga_pan_can_atlas_2018%2Clusc_tcga_pan_can_atlas_2018%2Cmeso_tcga_pan_can_atlas_2018%2Cmbc_genie_2020%2Cov_tcga_pan_can_atlas_2018%2Cpaad_tcga_pan_can_atlas_2018%2Cpcpg_tcga_pan_can_atlas_2018%2Cprad_tcga_pan_can_atlas_2018%2Csarc_tcga_pan_can_atlas_2018%2Cskcm_tcga_pan_can_atlas_2018%2Cstad_tcga_pan_can_atlas_2018%2Ctgct_tcga_pan_can_atlas_2018%2Cthym_tcga_pan_can_atlas_2018%2Cthca_tcga_pan_can_atlas_2018%2Cucs_tcga_pan_can_atlas_2018%2Cucec_tcga_pan_can_atlas_2018%2Cuvm_tcga_pan_can_atlas_2018

onursumer commented 2 months ago

There's also discrepancy with NA counts for most of the bar charts.

Legacy:

na_count_legacy

CH:

na_count_ch
onursumer commented 2 months ago

Looks like this happens only when we combine studies.

For example, OS_MONTHS_INIT_DIAGNOSIS is only in mbc_genie_2020, and for that study the NA count is zero both in legacy and the CH implementation. (https://genie-public-beta.cbioportal.org/study/summary?id=mbc_genie_2020 vs https://genie-public-beta.cbioportal.org/study/summary?id=mbc_genie_2020?legacy=1)

Screenshot 2024-08-14 at 9 05 36 AM

I think for combined studies we ignore samples from other studies when counting NAs, because they don't have OS_MONTHS_INIT_DIAGNOSIS clinical data. Looks like we only take mbc_genie_2020 samples into account and ignore the rest.

The legacy implementation, on the other hand, takes all samples from all studies into account when counting NAs.