Closed alisman closed 4 months ago
Profiled api/column-store/clinical-data-bin-counts/fetch endpoint by using the columnar-clinical-data-binner branch and a cloud clickhouse instance which utilizes regular views only.
api/column-store/clinical-data-bin-counts/fetch
Fetch method being invoked
fetch("http://localhost:8080/api/column-store/clinical-data-bin-counts/fetch?dataBinMethod=STATIC", { "headers": { "accept": "application/json", "accept-language": "en-US,en;q=0.9,tr;q=0.8,fi;q=0.7", "content-type": "application/json", "sec-ch-ua": "\"Chromium\";v=\"124\", \"Google Chrome\";v=\"124\", \"Not-A.Brand\";v=\"99\"", "sec-ch-ua-mobile": "?0", "sec-ch-ua-platform": "\"macOS\"", "sec-fetch-dest": "empty", "sec-fetch-mode": "cors", "sec-fetch-site": "same-origin" }, "referrer": "http://localhost:8080/study/summary?id=genie_public", "referrerPolicy": "strict-origin-when-cross-origin", "body": "{\"attributes\":[{\"attributeId\":\"MUTATION_COUNT\",\"disableLogScale\":false,\"showNA\":true},{\"attributeId\":\"FRACTION_GENOME_ALTERED\",\"disableLogScale\":false,\"showNA\":true},{\"attributeId\":\"AGE_AT_SEQ_REPORT\",\"disableLogScale\":false,\"showNA\":true},{\"attributeId\":\"INT_CONTACT\",\"disableLogScale\":false,\"showNA\":true},{\"attributeId\":\"INT_DOD\",\"disableLogScale\":false,\"showNA\":true},{\"attributeId\":\"YEAR_DEATH\",\"disableLogScale\":false,\"showNA\":true},{\"attributeId\":\"YEAR_CONTACT\",\"disableLogScale\":false,\"showNA\":true}],\"studyViewFilter\":{\"studyIds\":[\"genie_public\"],\"alterationFilter\":{\"copyNumberAlterationEventTypes\":{\"AMP\":true,\"HOMDEL\":true},\"mutationEventTypes\":{\"any\":true},\"structuralVariants\":null,\"includeDriver\":true,\"includeVUS\":true,\"includeUnknownOncogenicity\":true,\"includeUnknownTier\":true,\"includeGermline\":true,\"includeSomatic\":true,\"includeUnknownStatus\":true,\"tiersBooleanMap\":{}}}}", "method": "POST", "mode": "cors", "credentials": "include" });
Overview
getFilteredSamples CPU time is about 31%
getFilteredSamples
calcuNaDataBin CPU time is about 5%
calcuNaDataBin
countNAs CPU time is about 5% as well
countNAs
It should be relatively easier to calculate/count NAs with some SQL queries instead of fetching the filtered sample data and processing it in Java. That way we may be able to improve the performance of this endpoint by 30 to 40 percent.
Doing https://github.com/cBioPortal/rfc80-team/issues/16 now before doing more profiling
Profiled
api/column-store/clinical-data-bin-counts/fetch
endpoint by using the columnar-clinical-data-binner branch and a cloud clickhouse instance which utilizes regular views only.Fetch method being invoked
Overview
getFilteredSamples
CPU time is about 31%calcuNaDataBin
CPU time is about 5%countNAs
CPU time is about 5% as wellIt should be relatively easier to calculate/count NAs with some SQL queries instead of fetching the filtered sample data and processing it in Java. That way we may be able to improve the performance of this endpoint by 30 to 40 percent.