cBioPortal / cbioportal

cBioPortal for Cancer Genomics
https://cbioportal.org
GNU Affero General Public License v3.0
644 stars 502 forks source link

ClinicalDataBinCounts discrepancy #11098

Open alisman opened 2 weeks ago

alisman commented 2 weeks ago

image

curl 'http://localhost:8082/api/column-store/clinical-data-bin-counts/fetch?dataBinMethod=STATIC'           -H 'accept: application/json, text/plain, */*'           -H 'accept-language: en-US,en;q=0.9'           -H 'cache-control: no-cache'           -H 'content-type: application/json'           -H 'cookie: _ga_ET18FDC3P1=GS1.1.1727902893.87.0.1727902893.0.0.0; _gid=GA1.2.1570078648.1728481898; _ga_CKJ2CEEFD8=GS1.1.1728589307.172.1.1728589613.0.0.0; _ga_5260NDGD6Z=GS1.1.1728612388.318.1.1728612389.0.0.0; _gat_gtag_UA_17134933_2=1; _ga=GA1.1.1260093286.1710808634; _ga_334HHWHCPJ=GS1.1.1728647421.32.1.1728647514.0.0.0'           -H 'pragma: no-cache'           -H 'priority: u=1, i'            -H 'sec-ch-ua: "Google Chrome";v="129", "Not=A?Brand";v="8", "Chromium";v="129"'           -H 'sec-ch-ua-mobile: ?0'           -H 'sec-ch-ua-platform: "macOS"'           -H 'sec-fetch-dest: empty'           -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36'           --data-raw '{"studyViewFilter":{"studyIds":["aml_ohsu_2022","aml_ohsu_2018","laml_tcga"],"alterationFilter":{"copyNumberAlterationEventTypes":{"AMP":true,"HOMDEL":true},"mutationEventTypes":{"any":true},"includeDriver":true,"includeSomatic":true,"includeUnknownTier":true,"includeGermline":true,"includeUnknownStatus":true,"includeUnknownOncogenicity":true,"includeVUS":true},"clinicalDataFilters":[{"attributeId":"CANCER_TYPE_DETAILED","values":[{"value":"Acute Myeloid Leukemia"},{"value":"Therapy-Related Myeloid Neoplasms"}]}]},"attributes":[{"attributeId":"MUTATION_COUNT","showNA":false,"disableLogScale":false},{"attributeId":"FRACTION_GENOME_ALTERED","showNA":false,"disableLogScale":false},{"attributeId":"AGE","showNA":false,"disableLogScale":false},{"attributeId":"ABNORMAL_LYMPHOCYTE_PERCENT","showNA":false,"disableLogScale":false},{"attributeId":"AGE_AT_DIAGNOSIS","showNA":false,"disableLogScale":false},{"attributeId":"AGE_AT_PROCUREMENT","showNA":false,"disableLogScale":false},{"attributeId":"AGE_AT_SPECIMEN_ACQUISITION","showNA":false,"disableLogScale":false},{"attributeId":"PB_ALT_LEVEL","showNA":false,"disableLogScale":false},{"attributeId":"PB_ALBUMIN_LEVEL","showNA":false,"disableLogScale":false},{"attributeId":"PB_AST_LEVEL","showNA":false,"disableLogScale":false}]}';
onursumer commented 2 weeks ago

These are the 9 samples reported by the legacy

Patient ID Sample ID Age at Diagnosis
aml_ohsu_2018_1561 aml_ohsu_2018_14-00289 23
aml_ohsu_2018_1561 aml_ohsu_2018_14-00676 23
aml_ohsu_2022_2394 aml_ohsu_2022_2394_BA2220 23
aml_ohsu_2022_2394 aml_ohsu_2022_2394_BA2720 23
aml_ohsu_2018_953 aml_ohsu_2018_13-00028 24
aml_ohsu_2018_2420 aml_ohsu_2018_15-00900 24
aml_ohsu_2018_2420 aml_ohsu_2018_15-00988 24
aml_ohsu_2018_2495 aml_ohsu_2018_16-00087 24
aml_ohsu_2022_2090 aml_ohsu_2022_2090_BA2911 24

When we query the clinical_data_derived database (clickhouse) for these ids we end up with 7 patients/samples

SELECT patient_unique_id, sample_unique_id, attribute_value
FROM cgds_public_v5.clinical_data_derived
WHERE
    attribute_name='AGE_AT_DIAGNOSIS'
    AND (
        patient_unique_id IN (
                          'aml_ohsu_2018_aml_ohsu_2018_1561',
                          'aml_ohsu_2022_aml_ohsu_2022_2394',
                          'aml_ohsu_2018_aml_ohsu_2018_953',
                          'aml_ohsu_2018_aml_ohsu_2018_2420',
                          'aml_ohsu_2018_aml_ohsu_2018_2495',
                          'aml_ohsu_2022_aml_ohsu_2022_2090'
        )
        OR
        sample_unique_id IN (
                         'aml_ohsu_2018_aml_ohsu_2018_14-00289',
                         'aml_ohsu_2018_aml_ohsu_2018_14-00676',
                         'aml_ohsu_2022_aml_ohsu_2022_2394_BA2220',
                         'aml_ohsu_2022_aml_ohsu_2022_2394_BA2720',
                         'aml_ohsu_2018_aml_ohsu_2018_13-00028',
                         'aml_ohsu_2018_aml_ohsu_2018_15-00900',
                         'aml_ohsu_2018_aml_ohsu_2018_15-00988',
                         'aml_ohsu_2018_aml_ohsu_2018_16-00087',
                         'aml_ohsu_2022_aml_ohsu_2022_2090_BA2911'
        )
    )

image

onursumer commented 1 week ago

The discrepancy is a result of AGE_AT_DIAGNOSIS being a patient attribute for aml_ohsu_2018 and a sample attribute for aml_ohsu_2022.

In the legacy implementation, in case of conflicting attributes, we count samples. On the other hand, the count we return in the clickhouse implementation is a mix of sample and patient counts.

alisman commented 1 week ago

Confliciting sample/patient level attributes

jjgao commented 3 days ago

some discussion here (private): https://cbioportal.slack.com/archives/C04RQ1PV7R8/p1730738224028119