Currently the new (clickhouse) endpoint for fetching AlterationsCountByGenes for Mutations (/api/mutated-gens/fetch)
totalProfiledCases Count is below the legacy count by 4.
Difference found at TFRC.numberOfProfiledCases: (Legacy) 13638 != (New) 13634
After doing some initial research I have found that there are 4 samples that are not profiled at all. (I do not know if this makes sense... having samples in a study that are not profiled at all)
select count(distinct sample_id) from sample_profile INNER JOIN sample on sample_profile.sample_id = sample.internal_id INNER JOIN patient AS p ON sample.patient_id = p.internal_id INNER JOIN cancer_study AS cs ON p.cancer_study_id = cs.cancer_study_id where cancer_study_identifier = 'genie_public';
Returns 197976
select count(distinct sample_unique_id) from sample_view where cancer_study_identifier = 'genie_public';
Returns 197976
Query I used to determine which samples were not profiled.
select distinct sample_stable_id from sample_view where cancer_study_identifier = 'genie_public' and sample_stable_id not in ( SELECT DISTINCT s.stable_id FROM sample_profile sp INNER JOIN sample s ON sp.sample_id = s.internal_id INNER JOIN patient p ON s.patient_id = p.internal_id INNER JOIN cancer_study cs ON p.cancer_study_id = cs.cancer_study_id WHERE cs.cancer_study_identifier = 'genie_public' );
Currently the new (clickhouse) endpoint for fetching AlterationsCountByGenes for Mutations (/api/mutated-gens/fetch) totalProfiledCases Count is below the legacy count by 4.
Difference found at TFRC.numberOfProfiledCases: (Legacy) 13638 != (New) 13634
After doing some initial research I have found that there are 4 samples that are not profiled at all. (I do not know if this makes sense... having samples in a study that are not profiled at all)
select count(distinct sample_id) from sample_profile INNER JOIN sample on sample_profile.sample_id = sample.internal_id INNER JOIN patient AS p ON sample.patient_id = p.internal_id INNER JOIN cancer_study AS cs ON p.cancer_study_id = cs.cancer_study_id where cancer_study_identifier = 'genie_public';
Returns 197976select count(distinct sample_unique_id) from sample_view where cancer_study_identifier = 'genie_public';
Returns 197976Query I used to determine which samples were not profiled.
select distinct sample_stable_id from sample_view where cancer_study_identifier = 'genie_public' and sample_stable_id not in ( SELECT DISTINCT s.stable_id FROM sample_profile sp INNER JOIN sample s ON sp.sample_id = s.internal_id INNER JOIN patient p ON s.patient_id = p.internal_id INNER JOIN cancer_study cs ON p.cancer_study_id = cs.cancer_study_id WHERE cs.cancer_study_identifier = 'genie_public' );
List of samples missing.