Closed alisman closed 6 days ago
For the given filter, CH implementation returns two additional samples which are filtered out by the legacy implementation
{
"uniqueSampleKey": "TURBLVBDYS0xNDQtNDpwcmFkX21za19tZGFuZGVyc29uXzIwMjM",
"uniquePatientKey": "TURBLVBDYS1QYXQtMTQ0OnByYWRfbXNrX21kYW5kZXJzb25fMjAyMw",
"sampleId": "MDA-PCa-144-4",
"patientId": "MDA-PCa-Pat-144",
"studyId": "prad_msk_mdanderson_2023"
},
{
"uniqueSampleKey": "TURBLVBDYS0xNDQtNC1UMjAwOnByYWRfbXNrX21kYW5kZXJzb25fMjAyMw",
"uniquePatientKey": "TURBLVBDYS1QYXQtMTQ0OnByYWRfbXNrX21kYW5kZXJzb25fMjAyMw",
"sampleId": "MDA-PCa-144-4-T200",
"patientId": "MDA-PCa-Pat-144",
"studyId": "prad_msk_mdanderson_2023"
},
When we look at the genomic data for these samples (by running the query below) we see that alteration value for these samples is 0.2
SELECT * FROM
cgds_public_v5.genetic_alteration_derived
WHERE
sample_unique_id LIKE '%MDA-PCa-144-4%'
AND
hugo_gene_symbol = 'STAC2'
AND
profile_type = 'rna_seq_v2_mrna'
Legacy implementation does the filtering by excluding the start value
However, these samples are not filtered out by CH, because CH somehow does the filtering by including the start value 0.2