bhklab / CoreGx

Shared code for both PharmacoGx and RadioGx
https://bhklab.github.io/CoreGx/
GNU General Public License v3.0
2 stars 3 forks source link

Subsetting LongTable by rows only produces unexisting assay observations and key combinations #146

Closed ff98li closed 2 years ago

ff98li commented 2 years ago

Expected Behavior

Assays in a subset should only have observations for row data or column data by which the subset is created.

Current Behavior

Unexisting (rowKey, colKey) combinations present in the subset when subsetting only by row. Unexisting assay observations present in the subset regardless of subsetting by row, column, or both.

Possible Solution

When j is not supplied, includes only column data whose colKey have formed assay key pairs with the selected i in .intern. If j is supplied, then take the set difference between the selected column data and the column data that have been paired with the selected i to be associated with an assay observation. The main difficulty with this approach is that, some rowKey-colKey pairs could be associated with some observations in one assay, but not in another; we might need to encode this piece of information in .intern too. (Just a thought, don’t take it seriously)

Steps to Reproduce

  1. Run the script from DrugComboPSets/NCI-ALMANAC/scripts/nci_long_table_script.R to produce an LongTable object lt for NCI-ALMANAC dataset.
  2. Subset the object by selecting some rows.
select_row <- seq.int(1, dim(lt)[1], by = 28)
small_lt <- subset(lt, i = select_row)
assay_ <- assay(lt, "sensitivity", key = TRUE, withDimnames = TRUE)
assay_subset <- assay(small_lt, "sensitivity", key = TRUE, withDimnames = TRUE)

Output of assay_subset:

> assay_subset
             drug1id           drug2id drug1dose drug2dose replicate_id   cellid rowKey colKey   NSC1
     1:         <NA>              <NA>        NA        NA           NA    786-0   2353      1     NA
     2:         <NA>              <NA>        NA        NA           NA    786-0   2381      1     NA
     3:         <NA>              <NA>        NA        NA           NA    786-0   2409      1     NA
     4:         <NA>              <NA>        NA        NA           NA    786-0   2437      1     NA
     5:         <NA>              <NA>        NA        NA           NA    786-0   2465      1     NA
    ---                                                                                              
131287: pralatrexate Imatinib mesylate     6e-11     5e-07            1    TK-10   2325     56 754230
131288: pralatrexate Imatinib mesylate     6e-11     5e-07            1     U251   2325     57 754230
131289: pralatrexate Imatinib mesylate     6e-11     5e-07            1 UACC-257   2325     58 754230
131290: pralatrexate Imatinib mesylate     6e-11     5e-07            1  UACC-62   2325     59 754230
131291: pralatrexate Imatinib mesylate     6e-11     5e-07            1    UO-31   2325     60 754230
          NSC2 CELLNBR        PANEL PANELNBR CONCINDEX2 SAMPLE1 SAMPLE2 viability PERCENTGROWTHNOTZ
     1:     NA      18 Renal Cancer        9          0       4      NA   100.820           100.620
     2:     NA      18 Renal Cancer        9          0       4      NA   -23.090            20.020
     3:     NA      18 Renal Cancer        9          0       4      NA    25.020            35.910
     4:     NA      18 Renal Cancer        9          0       4      NA    -1.190            22.910
     5:     NA      18 Renal Cancer        9          3       4       1    -7.113            12.556
    ---                                                                                            
131287: 743414      24 Renal Cancer        9          0       4      NA    99.080            99.270
131288: 743414       9   CNS Cancer       12          0       4      NA    68.180            72.520
131289: 743414      21     Melanoma       10          0       4      NA    96.010            97.900
131290: 743414      20     Melanoma       10          0       4      NA    91.690            93.880
131291: 743414       4 Renal Cancer        9          0       4      NA    93.500            95.320
        EXPECTEDGROWTH TESTVALUE CONTROLVALUE    TZVALUE
     1:             NA   6381040    6341700.0 1556683.00
     2:             NA   1643560    8209393.0 2137100.00
     3:             NA   2577200    7177366.0 1042043.00
     4:             NA   1538120    6714120.0 1556683.00
     5:          2.345     33688     268311.4   36267.81
    ---                                                 
131287:             NA   4065800    4095706.0  838545.00
131288:             NA   4472320    6167253.0  840651.00
131289:             NA   5619560    5739933.0 2723402.00
131290:             NA   6406240    6824033.0 1795235.00
131291:             NA   7711760    8090100.0 2269541.00

Lots of assay observations that contain no therapy show up in the subset. Besides, it appears that there are unexisting (rowKey, colKey) combination in the subset, but this only happens when subsetting by row:

assay_[!assay_subset, on = .(rowKey, colKey)]

Result:

> assay_subset[!assay_, on = .(rowKey, colKey)]
      drug1id     drug2id drug1dose drug2dose replicate_id    cellid rowKey colKey   NSC1   NSC2 CELLNBR
1: Amifostine   Sirolimus     1e-04     1e-07            1 SK-MEL-28    178     47 296961 226080       8
2:  Tretinoin        <NA>     2e-06        NA            2  MALME-3M   2085     24 122758     NA       2
3: Valrubicin Clofarabine     2e-07     1e-08            1  MALME-3M   2142     24 246131 606869       2
      PANEL PANELNBR CONCINDEX2 SAMPLE1 SAMPLE2 viability PERCENTGROWTHNOTZ EXPECTEDGROWTH   TESTVALUE
1: Melanoma       10          0      14      NA   100.250            100.16             NA 4443880.000
2: Melanoma       10          2       8     178    93.152                NA         87.128       1.098
3: Melanoma       10          3       8      69    31.661                NA         52.274       0.824
   CONTROLVALUE     TZVALUE
1:  4436760.000 1573675.000
2:        1.142       0.499
3:        1.174       0.661

Note that the observation mapped by key (178, 47) has a drug combination, but this record does not exist in the sensitivity assay of the original dataset.

ChristopherEeles commented 2 years ago

Appears to be originating from reindex call in .subsetByIndex method on line 84 of LongTable-utils.R.

Likely an issue with the reindex implementation.

ChristopherEeles commented 2 years ago

@ff98li I think this should resolve the problem.

Please let me know when you have a chance to check and I will close this issue.

Best, Chris

ff98li commented 2 years ago

@ff98li I think this should resolve the problem.

Please let me know when you have a chance to check and I will close this issue.

Best, Chris

Thank you Chris, now it works like a charm!