cBioPortal / cbioportal

cBioPortal for Cancer Genomics
https://cbioportal.org
GNU Affero General Public License v3.0
664 stars 527 forks source link

Discrepancy involving genomicData filter #11197

Closed alisman closed 6 days ago

alisman commented 1 week ago
http://localhost:8082/study/summary?id=prad_msk_mdanderson_2023#filterJson={%22studyIds%22:[%22prad_msk_mdanderson_2023%22],%22alterationFilter%22:{%22copyNumberAlterationEventTypes%22:{%22AMP%22:true,%22HOMDEL%22:true},%22mutationEventTypes%22:{%22any%22:true},%22includeDriver%22:true,%22includeSomatic%22:true,%22includeUnknownTier%22:true,%22includeGermline%22:true,%22includeUnknownStatus%22:true,%22includeUnknownOncogenicity%22:true,%22includeVUS%22:true},%22genomicDataFilters%22:[{%22hugoGeneSymbol%22:%22STAC2%22,%22profileType%22:%22rna_seq_v2_mrna%22,%22values%22:[{%22start%22:0.2,%22end%22:0.25},{%22start%22:0.25,%22end%22:0.3},{%22start%22:0.3,%22end%22:0.35},{%22start%22:0.35,%22end%22:0.4},{%22start%22:0.4,%22end%22:0.45},{%22start%22:0.45,%22end%22:0.5},{%22start%22:0.5,%22end%22:0.55},{%22start%22:0.55,%22end%22:0.6},{%22start%22:0.6,%22end%22:0.65},{%22start%22:0.65,%22end%22:0.7},{%22start%22:0.7,%22end%22:0.75},{%22start%22:0.75,%22end%22:0.8},{%22start%22:0.8,%22end%22:0.85},{%22start%22:0.85,%22end%22:0.9},{%22start%22:0.9,%22end%22:0.95},{%22start%22:0.95}]}]}
onursumer commented 1 week ago

For the given filter, CH implementation returns two additional samples which are filtered out by the legacy implementation

{
        "uniqueSampleKey": "TURBLVBDYS0xNDQtNDpwcmFkX21za19tZGFuZGVyc29uXzIwMjM",
        "uniquePatientKey": "TURBLVBDYS1QYXQtMTQ0OnByYWRfbXNrX21kYW5kZXJzb25fMjAyMw",
        "sampleId": "MDA-PCa-144-4",
        "patientId": "MDA-PCa-Pat-144",
        "studyId": "prad_msk_mdanderson_2023"
    },
    {
        "uniqueSampleKey": "TURBLVBDYS0xNDQtNC1UMjAwOnByYWRfbXNrX21kYW5kZXJzb25fMjAyMw",
        "uniquePatientKey": "TURBLVBDYS1QYXQtMTQ0OnByYWRfbXNrX21kYW5kZXJzb25fMjAyMw",
        "sampleId": "MDA-PCa-144-4-T200",
        "patientId": "MDA-PCa-Pat-144",
        "studyId": "prad_msk_mdanderson_2023"
    },

When we look at the genomic data for these samples (by running the query below) we see that alteration value for these samples is 0.2

SELECT * FROM
    cgds_public_v5.genetic_alteration_derived
WHERE
    sample_unique_id LIKE '%MDA-PCa-144-4%'
AND
    hugo_gene_symbol = 'STAC2'
AND
    profile_type = 'rna_seq_v2_mrna'

image

Legacy implementation does the filtering by excluding the start value

image

However, these samples are not filtered out by CH, because CH somehow does the filtering by including the start value 0.2