RGLab / COMPASS

Combinatorial Polyfunctionality Analysis of Single Cells
6 stars 11 forks source link

Error when fitting the compass model #81

Open quinnpeters opened 11 months ago

quinnpeters commented 11 months ago

When I try to fit the COMPASS Model with a large compass container, I get the following error which I cannot figure out how to solve:

fit <- COMPASS(CC_comb, treatment = Stim == "SPIKE", control = Stim == "DMSO", iterations = 40000)

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 2, 7

As far as I can tell, there is no issue with the metadata or counts. The data structure of the compass container, however, must be causing some issue?

emjbishop commented 7 months ago

Myself and three other people are all getting the same error now when running the COMPASS function, using different datasets (including test data that used to work) and systems (linux, macOS, and windows). Is there a dependency that was updated this year causing this?

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 2, 7
gfinak commented 7 months ago

Can you share data / code so that I might reproduce the error?

emjbishop commented 7 months ago

@gfinak yes, we have a tutorial repo that uses COMPASS. You just need:

Here is a more complete error message with context:

> CC <- COMPASSContainerFromGatingSet(gs,
+                                     node = parent_node,
+                                     individual_id = id,
+                                     mp = markermap,
+                                     countFilterThreshold = 5000)
Extracting cell counts
Fetching 4+
Fetching child nodes
common markers are: 
Time FSC-A FSC-H SSC-A CD8b TNFa CD107a CD154 CD3 IL2 CD4 IL17a IL4_5_13 CD14_19 CCR7 CD38 LD IFNg CD45RA HLADR 
Extracting single cell data for 4+/IL2|4+/IL4513|4+/IFNG|4+/TNF|4+/IL17|4+/154|4+/107a
..............................Creating COMPASS Container
Filtering low counts
Filtering 0 samples due to low counts
Warning message:
In COMPASSContainer(data = sc_data, counts = counts, meta = pd,  :
  There appear to be negative intensities in the 'data' supplied.
> fit <- COMPASS(CC,
+                treatment = STIM == "Spike 1",
+                control = STIM == "DMSO",
+                iterations = 100)
There are a total of 5 samples from 5 individuals in the 'treatment' group.
There are a total of 5 samples from 5 individuals in the 'control' group.
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
  arguments imply differing number of rows: 2, 7

I've also heard from a colleague (edit: @quinnpeters) that they don't get the error with a much older version of R (v3.6.3).

malisas commented 3 months ago

Hi @emjbishop I got your separate private message to me about this issue. Thanks for providing the reproducible example. Just want to chime in that I was able to reproduce, and I think the issue has to do with NA values getting generated by COMPASSContainerFromGatingSet, which internally calls flowWorkspace::gs_get_singlecell_expression. I wonder if more recent R versions introduced different behavior in a dependency to this function, and caused NA values to be generated.

Here's some runtime output showing how COMPASS failed on CC but succeeded on CC2. NA values don't always get generated it seems: Both COMPASSContainers were generated by the same call to COMPASSContainerFromGatingSet, but CC contains NA values in the data used by COMPASS:

> CC <- COMPASSContainerFromGatingSet(gs,
+                                     node = parent_node,
+                                     individual_id = id,
+                                     mp = markermap,
+                                     countFilterThreshold = 5000)
Extracting cell counts
Fetching 4+
Fetching child nodes
common markers are: 
Time FSC-A FSC-H SSC-A CD8b TNFa CD107a CD154 CD3 IL2 CD4 IL17a IL4_5_13 CD14_19 CCR7 CD38 LD IFNg CD45RA HLADR 
Extracting single cell data for 4+/IL2|4+/IL4513|4+/IFNG|4+/TNF|4+/IL17|4+/154|4+/107a
..............................Creating COMPASS Container
Filtering low counts
Filtering 0 samples due to low counts
Warning message:
In COMPASSContainer(data = sc_data, counts = counts, meta = pd,  :
  There appear to be negative intensities in the 'data' supplied.

> CC2 <- COMPASSContainerFromGatingSet(gs,
+                                     node = parent_node,
+                                     individual_id = id,
+                                     mp = markermap,
+                                     countFilterThreshold = 5000)
Extracting cell counts
Fetching 4+
Fetching child nodes
common markers are: 
Time FSC-A FSC-H SSC-A CD8b TNFa CD107a CD154 CD3 IL2 CD4 IL17a IL4_5_13 CD14_19 CCR7 CD38 LD IFNg CD45RA HLADR 
Extracting single cell data for 4+/IL2|4+/IL4513|4+/IFNG|4+/TNF|4+/IL17|4+/154|4+/107a
..............................Creating COMPASS Container
Filtering low counts
Filtering 0 samples due to low counts
Warning message:
In COMPASSContainer(data = sc_data, counts = counts, meta = pd,  :
  There appear to be negative intensities in the 'data' supplied.

> CC$data$`114716.fcs_445737`[100,]
          IL2      IL4_5_13          IFNg          TNFa         IL17a         CD154        CD107a 
1.018558e-312 2.781342e-309 8.257046e-317  1.746838e+03 2.483463e-265 1.807873e-308 1.157082e-309 
> CC2$data$`114716.fcs_445737`[100,]
          IL2      IL4_5_13          IFNg          TNFa         IL17a         CD154        CD107a 
1.820499e-318 1.823898e-318 1.827297e-318  1.746838e+03 1.834095e-318 1.837494e-318 1.840894e-318 

> dmso_spike_1_sn <- CC2$meta %>% dplyr::filter(STIM %in% c("DMSO", "Spike 1")) %>% dplyr::pull(name)
> table(unlist((lapply(CC$data[dmso_spike_1_sn], function(x) {any(is.na(x))}))))

FALSE  TRUE 
    9     1 
> table(unlist((lapply(CC2$data[dmso_spike_1_sn], function(x) {any(is.na(x))}))))

FALSE 
   10 

> fit <- COMPASS(CC,
+                treatment = STIM == "Spike 1",
+                control = STIM == "DMSO",
+                iterations = 100)
There are a total of 5 samples from 5 individuals in the 'treatment' group.
There are a total of 5 samples from 5 individuals in the 'control' group.
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
  arguments imply differing number of rows: 2, 7

> any(is.na(CC$data$`114716.fcs_445737`))
[1] TRUE
> any(is.na(CC2$data$`114716.fcs_445737`))
[1] FALSE

> fit2 <- COMPASS(CC2,
+                treatment = STIM == "Spike 1",
+                control = STIM == "DMSO",
+                iterations = 100)
There are a total of 5 samples from 5 individuals in the 'treatment' group.
There are a total of 5 samples from 5 individuals in the 'control' group.
The model will be run on 5 paired samples.
The category filter has removed 123 of 125 categories.
There are a total of 2 categories to be tested.
Initializing parameters...
Computing initial parameter estimates...
Keeping 100 iterations. We'll thin every 8 iterations.
Burnin for 100 iterations...
Sampling 800 iterations...
Done!
Computing the posterior difference in proportions, posterior log ratio...
Done!
> fit2
A COMPASS model fit on 5 paired samples.
> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.16.0    COMPASS_1.38.1       flowWorkspace_4.12.2 flowCore_2.12.2      CytoML_2.12.0        here_1.0.1          

loaded via a namespace (and not attached):
 [1] tidyr_1.3.1         utf8_1.2.3          generics_0.1.3      lattice_0.21-8      digest_0.6.33       magrittr_2.0.3      evaluate_0.21       grid_4.3.1         
 [9] RColorBrewer_1.1-3  fastmap_1.1.1       rprojroot_2.0.3     ggcyto_1.28.1       plyr_1.8.9          jsonlite_1.8.7      graph_1.78.0        gridExtra_2.3      
[17] BiocManager_1.30.22 purrr_1.0.2         fansi_1.0.4         scales_1.3.0        XML_3.99-0.16       Rgraphviz_2.44.0    abind_1.4-5         cli_3.6.1          
[25] rlang_1.1.1         RProtoBufLib_2.12.1 Biobase_2.60.0      munsell_0.5.0       yaml_2.3.7          cytolib_2.12.1      tools_4.3.1         ncdfFlow_2.46.0    
[33] dplyr_1.1.4         colorspace_2.1-0    ggplot2_3.4.4       BiocGenerics_0.46.0 vctrs_0.6.4         R6_2.5.1            matrixStats_1.1.0   stats4_4.3.1       
[41] lifecycle_1.0.3     zlibbioc_1.46.0     S4Vectors_0.38.2    RBGL_1.76.0         clue_0.3-65         pdist_1.2.1         cluster_2.1.4       pkgconfig_2.0.3    
[49] pillar_1.9.0        hexbin_1.28.3       gtable_0.3.4        glue_1.6.2          Rcpp_1.0.11         xfun_0.40           tibble_3.2.1        tidyselect_1.2.0   
[57] knitr_1.43          rstudioapi_0.15.0   htmltools_0.5.6     rmarkdown_2.24      compiler_4.3.1     

Haven't gotten to the root of the problem but documenting the investigation so far.

gfinak commented 3 months ago

I suggest looking at the SimpleCOMPASS interface. CompassContainerFromGatingSet makes a lot of assumptions and we ended up abandoning it internally.

emjbishop commented 3 months ago

Thank you @malisas and @gfinak, we will switch to SimpleCOMPASS (which seems to work with R 4.4.1). We appreciate it and will pass this on to our collaborators.

emjbishop commented 1 week ago

Hello again, we appreciate the suggestion to use SimpleCOMPASS and indeed it works well. However, when we have dozens of subject samples with multiple stims and cell types it's not practical to generate the boolean combinations in FlowJo for each subject. Do you have a way to automatically generate the boolean combinations, or a method in FlowJo?

gfinak commented 1 week ago

Indeed, we would typically export the flowJo wsp file, read it in with flowWorskpace, then there are some internal functions that could generate the boolean counts from the marginal gated data at the single cell level.

The doc for SimpleCompass highlights the functions used: https://github.com/RGLab/COMPASS/blob/38dc98682426a6794dd652d2440f645ed8835abd/R/SimpleCOMPASS.R#L47-L48

CellCounts can be used generate the data you need for SimpleCompass, given inputs are a list of matrices, where each matrix is the single cell expression of cells from a cell population in a sample (e.g. CD4 T cells) for rows, and columns are each marker/protein measured and thresholded to be 0 if it's below the gate (i.e. negative). With some work on a GatingSet created from the FCS files and an imported flowJo workspace, you can generate these matrices, and the required data (left as an exercise for the reader).