cBioPortal / rfc80-team

repository to hold issues for the rfc80 development / deployment team
0 stars 0 forks source link

Range filtering on some clinical data causing invalid result #44

Open alisman opened 3 weeks ago

alisman commented 3 weeks ago

Range filtering on clinical data failing to validate against legacy. @onursumer this is probably because of the specialTypes?

fetch("https://genie-public-beta1.cbioportal.org/api/column-store/filtered-samples/fetch?", {
  "headers": {
    "accept": "*/*",
    "accept-language": "en-US,en;q=0.9",
    "cache-control": "no-cache",
    "content-type": "application/json",
    "pragma": "no-cache",
    "priority": "u=1, i",
    "sec-ch-ua": "\"Not)A;Brand\";v=\"99\", \"Google Chrome\";v=\"127\", \"Chromium\";v=\"127\"",
    "sec-ch-ua-mobile": "?0",
    "sec-ch-ua-platform": "\"macOS\"",
    "sec-fetch-dest": "empty",
    "sec-fetch-mode": "cors",
    "sec-fetch-site": "same-site"
  },
  "referrer": "https://genie-public-beta.cbioportal.org/",
  "referrerPolicy": "strict-origin-when-cross-origin",
  "body": "{\"clinicalDataFilters\":[{\"attributeId\":\"BUFFA_HYPOXIA_SCORE\",\"values\":[{\"start\":-15,\"end\":-10},{\"start\":-10,\"end\":-5},{\"start\":-5,\"end\":0},{\"start\":0,\"end\":5},{\"start\":5,\"end\":10},{\"start\":10,\"end\":15},{\"start\":15,\"end\":20},{\"start\":20,\"end\":25},{\"start\":25,\"end\":30}]}],\"studyIds\":[\"skcm_tcga_pan_can_atlas_2018\"],\"alterationFilter\":{\"copyNumberAlterationEventTypes\":{\"AMP\":true,\"HOMDEL\":true},\"mutationEventTypes\":{\"any\":true},\"structuralVariants\":null,\"includeDriver\":true,\"includeVUS\":true,\"includeUnknownOncogenicity\":true,\"includeUnknownTier\":true,\"includeGermline\":true,\"includeSomatic\":true,\"includeUnknownStatus\":true,\"tiersBooleanMap\":{}}}",
  "method": "POST",
  "mode": "cors",
  "credentials": "omit"
});
onursumer commented 3 weeks ago

Hmm, there is no clinical data for BUFFA_HYPOXIA_SCORE attribute neither in the clickhouse nor in the mysql database. There is no attribute BUFFA_HYPOXIA_SCORE in the clinical_attribute_meta table either. This is based on the db_2024_06_24_original and sling_db_2024_06_24_original databases. Those are the ones used by the beta instance, right?

onursumer commented 3 weeks ago

https://github.com/cBioPortal/cbioportal/blob/956fb3379fb4d03559da1ecb355b0ee711fc0693/src/main/resources/org/cbioportal/persistence/mybatisclickhouse/StudyViewFilterMapper.xml#L248

Looks like our numerical clinical data filter only matches positive integers. Any negative or decimal point number is ignored by this regex. This is probably the cause of the discrepancy for this filter.