Open quinnpeters opened 11 months ago
Myself and three other people are all getting the same error now when running the COMPASS
function, using different datasets (including test data that used to work) and systems (linux, macOS, and windows). Is there a dependency that was updated this year causing this?
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 2, 7
Can you share data / code so that I might reproduce the error?
@gfinak yes, we have a tutorial repo that uses COMPASS. You just need:
Here is a more complete error message with context:
> CC <- COMPASSContainerFromGatingSet(gs,
+ node = parent_node,
+ individual_id = id,
+ mp = markermap,
+ countFilterThreshold = 5000)
Extracting cell counts
Fetching 4+
Fetching child nodes
common markers are:
Time FSC-A FSC-H SSC-A CD8b TNFa CD107a CD154 CD3 IL2 CD4 IL17a IL4_5_13 CD14_19 CCR7 CD38 LD IFNg CD45RA HLADR
Extracting single cell data for 4+/IL2|4+/IL4513|4+/IFNG|4+/TNF|4+/IL17|4+/154|4+/107a
..............................Creating COMPASS Container
Filtering low counts
Filtering 0 samples due to low counts
Warning message:
In COMPASSContainer(data = sc_data, counts = counts, meta = pd, :
There appear to be negative intensities in the 'data' supplied.
> fit <- COMPASS(CC,
+ treatment = STIM == "Spike 1",
+ control = STIM == "DMSO",
+ iterations = 100)
There are a total of 5 samples from 5 individuals in the 'treatment' group.
There are a total of 5 samples from 5 individuals in the 'control' group.
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 2, 7
I've also heard from a colleague (edit: @quinnpeters) that they don't get the error with a much older version of R (v3.6.3).
Hi @emjbishop I got your separate private message to me about this issue.
Thanks for providing the reproducible example. Just want to chime in that I was able to reproduce, and I think the issue has to do with NA
values getting generated by COMPASSContainerFromGatingSet
, which internally calls flowWorkspace::gs_get_singlecell_expression
. I wonder if more recent R versions introduced different behavior in a dependency to this function, and caused NA
values to be generated.
Here's some runtime output showing how COMPASS
failed on CC
but succeeded on CC2
. NA
values don't always get generated it seems: Both COMPASSContainer
s were generated by the same call to COMPASSContainerFromGatingSet
, but CC
contains NA
values in the data used by COMPASS:
> CC <- COMPASSContainerFromGatingSet(gs,
+ node = parent_node,
+ individual_id = id,
+ mp = markermap,
+ countFilterThreshold = 5000)
Extracting cell counts
Fetching 4+
Fetching child nodes
common markers are:
Time FSC-A FSC-H SSC-A CD8b TNFa CD107a CD154 CD3 IL2 CD4 IL17a IL4_5_13 CD14_19 CCR7 CD38 LD IFNg CD45RA HLADR
Extracting single cell data for 4+/IL2|4+/IL4513|4+/IFNG|4+/TNF|4+/IL17|4+/154|4+/107a
..............................Creating COMPASS Container
Filtering low counts
Filtering 0 samples due to low counts
Warning message:
In COMPASSContainer(data = sc_data, counts = counts, meta = pd, :
There appear to be negative intensities in the 'data' supplied.
> CC2 <- COMPASSContainerFromGatingSet(gs,
+ node = parent_node,
+ individual_id = id,
+ mp = markermap,
+ countFilterThreshold = 5000)
Extracting cell counts
Fetching 4+
Fetching child nodes
common markers are:
Time FSC-A FSC-H SSC-A CD8b TNFa CD107a CD154 CD3 IL2 CD4 IL17a IL4_5_13 CD14_19 CCR7 CD38 LD IFNg CD45RA HLADR
Extracting single cell data for 4+/IL2|4+/IL4513|4+/IFNG|4+/TNF|4+/IL17|4+/154|4+/107a
..............................Creating COMPASS Container
Filtering low counts
Filtering 0 samples due to low counts
Warning message:
In COMPASSContainer(data = sc_data, counts = counts, meta = pd, :
There appear to be negative intensities in the 'data' supplied.
> CC$data$`114716.fcs_445737`[100,]
IL2 IL4_5_13 IFNg TNFa IL17a CD154 CD107a
1.018558e-312 2.781342e-309 8.257046e-317 1.746838e+03 2.483463e-265 1.807873e-308 1.157082e-309
> CC2$data$`114716.fcs_445737`[100,]
IL2 IL4_5_13 IFNg TNFa IL17a CD154 CD107a
1.820499e-318 1.823898e-318 1.827297e-318 1.746838e+03 1.834095e-318 1.837494e-318 1.840894e-318
> dmso_spike_1_sn <- CC2$meta %>% dplyr::filter(STIM %in% c("DMSO", "Spike 1")) %>% dplyr::pull(name)
> table(unlist((lapply(CC$data[dmso_spike_1_sn], function(x) {any(is.na(x))}))))
FALSE TRUE
9 1
> table(unlist((lapply(CC2$data[dmso_spike_1_sn], function(x) {any(is.na(x))}))))
FALSE
10
> fit <- COMPASS(CC,
+ treatment = STIM == "Spike 1",
+ control = STIM == "DMSO",
+ iterations = 100)
There are a total of 5 samples from 5 individuals in the 'treatment' group.
There are a total of 5 samples from 5 individuals in the 'control' group.
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 2, 7
> any(is.na(CC$data$`114716.fcs_445737`))
[1] TRUE
> any(is.na(CC2$data$`114716.fcs_445737`))
[1] FALSE
> fit2 <- COMPASS(CC2,
+ treatment = STIM == "Spike 1",
+ control = STIM == "DMSO",
+ iterations = 100)
There are a total of 5 samples from 5 individuals in the 'treatment' group.
There are a total of 5 samples from 5 individuals in the 'control' group.
The model will be run on 5 paired samples.
The category filter has removed 123 of 125 categories.
There are a total of 2 categories to be tested.
Initializing parameters...
Computing initial parameter estimates...
Keeping 100 iterations. We'll thin every 8 iterations.
Burnin for 100 iterations...
Sampling 800 iterations...
Done!
Computing the posterior difference in proportions, posterior log ratio...
Done!
> fit2
A COMPASS model fit on 5 paired samples.
> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.16.0 COMPASS_1.38.1 flowWorkspace_4.12.2 flowCore_2.12.2 CytoML_2.12.0 here_1.0.1
loaded via a namespace (and not attached):
[1] tidyr_1.3.1 utf8_1.2.3 generics_0.1.3 lattice_0.21-8 digest_0.6.33 magrittr_2.0.3 evaluate_0.21 grid_4.3.1
[9] RColorBrewer_1.1-3 fastmap_1.1.1 rprojroot_2.0.3 ggcyto_1.28.1 plyr_1.8.9 jsonlite_1.8.7 graph_1.78.0 gridExtra_2.3
[17] BiocManager_1.30.22 purrr_1.0.2 fansi_1.0.4 scales_1.3.0 XML_3.99-0.16 Rgraphviz_2.44.0 abind_1.4-5 cli_3.6.1
[25] rlang_1.1.1 RProtoBufLib_2.12.1 Biobase_2.60.0 munsell_0.5.0 yaml_2.3.7 cytolib_2.12.1 tools_4.3.1 ncdfFlow_2.46.0
[33] dplyr_1.1.4 colorspace_2.1-0 ggplot2_3.4.4 BiocGenerics_0.46.0 vctrs_0.6.4 R6_2.5.1 matrixStats_1.1.0 stats4_4.3.1
[41] lifecycle_1.0.3 zlibbioc_1.46.0 S4Vectors_0.38.2 RBGL_1.76.0 clue_0.3-65 pdist_1.2.1 cluster_2.1.4 pkgconfig_2.0.3
[49] pillar_1.9.0 hexbin_1.28.3 gtable_0.3.4 glue_1.6.2 Rcpp_1.0.11 xfun_0.40 tibble_3.2.1 tidyselect_1.2.0
[57] knitr_1.43 rstudioapi_0.15.0 htmltools_0.5.6 rmarkdown_2.24 compiler_4.3.1
Haven't gotten to the root of the problem but documenting the investigation so far.
I suggest looking at the SimpleCOMPASS interface. CompassContainerFromGatingSet makes a lot of assumptions and we ended up abandoning it internally.
Thank you @malisas and @gfinak, we will switch to SimpleCOMPASS (which seems to work with R 4.4.1). We appreciate it and will pass this on to our collaborators.
Hello again, we appreciate the suggestion to use SimpleCOMPASS and indeed it works well. However, when we have dozens of subject samples with multiple stims and cell types it's not practical to generate the boolean combinations in FlowJo for each subject. Do you have a way to automatically generate the boolean combinations, or a method in FlowJo?
Indeed, we would typically export the flowJo wsp file, read it in with flowWorskpace, then there are some internal functions that could generate the boolean counts from the marginal gated data at the single cell level.
The doc for SimpleCompass highlights the functions used: https://github.com/RGLab/COMPASS/blob/38dc98682426a6794dd652d2440f645ed8835abd/R/SimpleCOMPASS.R#L47-L48
CellCounts
can be used generate the data you need for SimpleCompass, given inputs are a list of matrices, where each matrix is the single cell expression of cells from a cell population in a sample (e.g. CD4 T cells) for rows, and columns are each marker/protein measured and thresholded to be 0 if it's below the gate (i.e. negative).
With some work on a GatingSet created from the FCS files and an imported flowJo workspace, you can generate these matrices, and the required data (left as an exercise for the reader).
When I try to fit the COMPASS Model with a large compass container, I get the following error which I cannot figure out how to solve:
fit <- COMPASS(CC_comb, treatment = Stim == "SPIKE", control = Stim == "DMSO", iterations = 40000)
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 2, 7
As far as I can tell, there is no issue with the metadata or counts. The data structure of the compass container, however, must be causing some issue?