Open alisman opened 2 weeks ago
These are the 9 samples reported by the legacy
Patient ID | Sample ID | Age at Diagnosis |
---|---|---|
aml_ohsu_2018_1561 | aml_ohsu_2018_14-00289 | 23 |
aml_ohsu_2018_1561 | aml_ohsu_2018_14-00676 | 23 |
aml_ohsu_2022_2394 | aml_ohsu_2022_2394_BA2220 | 23 |
aml_ohsu_2022_2394 | aml_ohsu_2022_2394_BA2720 | 23 |
aml_ohsu_2018_953 | aml_ohsu_2018_13-00028 | 24 |
aml_ohsu_2018_2420 | aml_ohsu_2018_15-00900 | 24 |
aml_ohsu_2018_2420 | aml_ohsu_2018_15-00988 | 24 |
aml_ohsu_2018_2495 | aml_ohsu_2018_16-00087 | 24 |
aml_ohsu_2022_2090 | aml_ohsu_2022_2090_BA2911 | 24 |
When we query the clinical_data_derived
database (clickhouse) for these ids we end up with 7 patients/samples
SELECT patient_unique_id, sample_unique_id, attribute_value
FROM cgds_public_v5.clinical_data_derived
WHERE
attribute_name='AGE_AT_DIAGNOSIS'
AND (
patient_unique_id IN (
'aml_ohsu_2018_aml_ohsu_2018_1561',
'aml_ohsu_2022_aml_ohsu_2022_2394',
'aml_ohsu_2018_aml_ohsu_2018_953',
'aml_ohsu_2018_aml_ohsu_2018_2420',
'aml_ohsu_2018_aml_ohsu_2018_2495',
'aml_ohsu_2022_aml_ohsu_2022_2090'
)
OR
sample_unique_id IN (
'aml_ohsu_2018_aml_ohsu_2018_14-00289',
'aml_ohsu_2018_aml_ohsu_2018_14-00676',
'aml_ohsu_2022_aml_ohsu_2022_2394_BA2220',
'aml_ohsu_2022_aml_ohsu_2022_2394_BA2720',
'aml_ohsu_2018_aml_ohsu_2018_13-00028',
'aml_ohsu_2018_aml_ohsu_2018_15-00900',
'aml_ohsu_2018_aml_ohsu_2018_15-00988',
'aml_ohsu_2018_aml_ohsu_2018_16-00087',
'aml_ohsu_2022_aml_ohsu_2022_2090_BA2911'
)
)
The discrepancy is a result of AGE_AT_DIAGNOSIS
being a patient attribute for aml_ohsu_2018
and a sample attribute for aml_ohsu_2022
.
In the legacy implementation, in case of conflicting attributes, we count samples. On the other hand, the count we return in the clickhouse implementation is a mix of sample and patient counts.
Confliciting sample/patient level attributes
some discussion here (private): https://cbioportal.slack.com/archives/C04RQ1PV7R8/p1730738224028119