cBioPortal / cbioportal

cBioPortal for Cancer Genomics
https://cbioportal.org
GNU Affero General Public License v3.0
636 stars 487 forks source link

Missing one sample in filtered samples (CH is 1 less) #11079

Open alisman opened 1 week ago

alisman commented 1 week ago
curl 'http://localhost:8082/api/column-store/filtered-samples/fetch'           -H 'accept: application/json, text/plain, */*'           -H 'accept-language: en-US,en;q=0.9'           -H 'cache-control: no-cache'           -H 'content-type: application/json'           -H 'cookie: _ga_ET18FDC3P1=GS1.1.1727902893.87.0.1727902893.0.0.0; _gid=GA1.2.1570078648.1728481898; _ga_CKJ2CEEFD8=GS1.1.1728589307.172.1.1728589613.0.0.0; _ga_5260NDGD6Z=GS1.1.1728612388.318.1.1728612389.0.0.0; _gat_gtag_UA_17134933_2=1; _ga=GA1.1.1260093286.1710808634; _ga_334HHWHCPJ=GS1.1.1728647421.32.1.1728647514.0.0.0'           -H 'pragma: no-cache'           -H 'priority: u=1, i'            -H 'sec-ch-ua: "Google Chrome";v="129", "Not=A?Brand";v="8", "Chromium";v="129"'           -H 'sec-ch-ua-mobile: ?0'           -H 'sec-ch-ua-platform: "macOS"'           -H 'sec-fetch-dest: empty'           -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36'           --data-raw '{"studyIds":["brca_tcga_gdc"],"alterationFilter":{"copyNumberAlterationEventTypes":{"AMP":true,"HOMDEL":true},"mutationEventTypes":{"any":true},"includeDriver":true,"includeSomatic":true,"includeUnknownTier":true,"includeGermline":true,"includeUnknownStatus":true,"includeUnknownOncogenicity":true,"includeVUS":true},"genomicProfiles":[["rna_seq_mrna"],["mutations"]]}';
onursumer commented 1 week ago

Missing sample is TCGA-AO-A1KO-01

{
    "uniqueSampleKey": "VENHQS1BTy1BMUtPLTAxOmJyY2FfdGNnYV9nZGM",
    "uniquePatientKey": "VENHQS1BTy1BMUtPOmJyY2FfdGNnYV9nZGM",
    "sampleId": "TCGA-AO-A1KO-01",
    "patientId": "TCGA-AO-A1KO",
    "studyId": "brca_tcga_gdc"
}
onursumer commented 6 days ago

The sample TCGA-AO-A1KO-01 doesn't have a mutation profile for this study, so it is excluded by the study view filter.

To verify run the SQL query below.

SELECT
    sp.sample_id as sampleInternalId,
    sd.sample_stable_id as sampleStableId,
    sd.sample_unique_id as sampleUniqueId,
    gp.stable_id as geneticProfile
FROM cgds_public_v5.sample_profile sp
    JOIN cgds_public_v5.sample_derived sd on sp.sample_id=sd.internal_id
    JOIN cgds_public_v5.genetic_profile gp on sp.genetic_profile_id=gp.genetic_profile_id
WHERE sd.sample_stable_id='TCGA-AO-A1KO-01' AND sd.cancer_study_identifier='brca_tcga_gdc'

image

Clickhouse SQL implementation is applying AND logic for the given genomic profiles.

https://github.com/cBioPortal/cbioportal/blob/8526c72feddbe39f11fa8dea0f2e9559585ce29c/src/main/resources/org/cbioportal/persistence/mybatisclickhouse/StudyViewFilterMapper.xml#L36-L58

Legacy SQL might be applying OR logic. Need to investigate further to confirm.

onursumer commented 6 days ago

Actually, legacy implementation is also applying AND logic, but it's getting the profile information from the gene panel.

https://github.com/cBioPortal/cbioportal/blob/f7d91c0d0f590cb3aff3afb6e7e62864c3e3a0ce/src/main/java/org/cbioportal/web/util/StudyViewFilterApplier.java#L332-L338

And according to the gene panel the sample TCGA-AO-A1KO-01 has the mutations genomic profile.

image

alisman commented 1 day ago

@onursumer i don't really understand how gene panel can be used because, unless i'm totally mistaken, there is no relation from gene panel to sample. gene panel only says what genes are profiled by a given genetic_profile? so i guess the question is, how is the above genePanelData derived?

onursumer commented 23 hours ago

https://github.com/cBioPortal/cbioportal/blob/f7d91c0d0f590cb3aff3afb6e7e62864c3e3a0ce/src/main/java/org/cbioportal/web/util/StudyViewFilterApplier.java#L332-L335

Here we get the sample id from gene panel datum by datum.getSampleId(). Not sure how we integrate sample id to gene panel data but here is the related class member.

https://github.com/cBioPortal/cbioportal/blob/4c3e0358e6d87761a7e6853ddd28ed4c6eb694c9/src/main/java/org/cbioportal/model/GenePanelData.java#L10