MarioniLab / miloR

R package implementation of Milo for testing for differential abundance in KNN graphs
https://bioconductor.org/packages/release/bioc/html/miloR.html
GNU General Public License v3.0
316 stars 20 forks source link

Lack of Robustness to Missing Values in Dependent Variable #324

Closed DarioS closed 1 month ago

DarioS commented 1 month ago

While looping through a bunch of outcomes, miloR faltered on a column that has some missing values.

> testNhoods(MILOdata, design = ~ Recurrence, design.df = design, fdr.weighting = "graph-overlap")
  Error in dimnames(x) <- dn: length of 'dimnames' [1] not equal to array extent
> head(design)
             Sample Recurrence
HN118_P     HN118_P       <NA>
OSCC_12-P OSCC_12-P        Yes
OSCC_16-P OSCC_16-P        Yes
OSCC_16-M OSCC_16-M        Yes
OSCC_20-P OSCC_20-P       <NA>
OSCC_22-P OSCC_22-P        Yes

I expect it could automatically handle it by avoiding any samples with a missing outcome value.

MikeDMorgan commented 1 month ago

Hi @DarioS - we leave it to the user to define how they wish to handle missing values.

DarioS commented 1 month ago

Ah. Do Milo objects have any special subsetting functions to remove certain samples?

MikeDMorgan commented 1 month ago

If you remove the samples you want to from the data frame you pass to design.df, then the subsetting on the nhoods count will happen automatically within testNhoods. As long as the rownames of your design data frame are a proper subset of the column names of nhood counts, it should work.