bhklab / CoreGx

Shared code for both PharmacoGx and RadioGx
https://bhklab.github.io/CoreGx/
GNU General Public License v3.0
2 stars 3 forks source link

Bug in `assay<-` method for `LongTable` causing balooning memory usage if an assay contains a subset of row and column keys #155

Closed ChristopherEeles closed 2 years ago

ChristopherEeles commented 2 years ago

Reprex:

    library(CoreGx)
    library(PharmacoGx)

    nci <- readRDS(file.path(".local_data", "NCI_ALMANAC_2017.rds"))
    tre <- treatmentResponse(nci)
    # test adding new assay
    tre |>
        endoaggregate(
            assay="sensitivity",
            target="sensitivity_no_reps",
            mean(treatment1dose),
            mean(treatment2dose),
            mean(viability),
            by=c("treatment1id", "treatment1dose", "treatment2id", "treatment2dose", "sampleid")
        ) ->
        ntre
# crashes R session on OOM!
ntre$sensitivity_no_reps <- ntre$sensitivity_no_reps

Likely due to a bad join with assayIndex in assay<- method

ChristopherEeles commented 2 years ago

This assignment uses over 20 GB of RAM temporarily! That is way too much.

Assignment for assays with all key columns uses about 1 GB or RAM for the assignment.

ChristopherEeles commented 2 years ago

RAM issue is resolved, but it is likely a redesign of the assay<-,LongTable-method could bring significant speed-ups.