RGLab / CytoML

A GatingML Interface for Cross Platform Cytometry Data Sharing
GNU Affero General Public License v3.0
30 stars 14 forks source link

Handling cytobank experiments with multiple panels #87

Closed bjreisman closed 3 years ago

bjreisman commented 4 years ago

Hi CytoML team,

I came across the following error when attempting to import a cytobank experiment which has FCS files associated with multiple panels. This is an use case I usually try to avoid as its clumsy to handle within cytobank, but I know many cases where others like to do this when staining the same sample with multiple panels, as in this public dataset #8938 from this tutorial.

Here's what the experiment panel looks like in cytobank: image

and here's the error I get when I try to import it using the ACS importing function:

> gs <- cytobank_to_gatingset(ce)
intact cells
singlets
750 lo
750hi
paco 0
paco 1
paco é
unstim1
ifn
il4
pma
lps
unstim2
All FCS files have the same following channels:
FSC-A
FSC-H
FSC-W
SSC-A
SSC-H
SSC-W
FITC-A
PE-A
PerCP-Cy5-5-A
PE-Cy7-A
Pacific Blue-A
Pacific Orange-A
Alexa 647-A
Alexa 700-A
Time
write Barcoding Only_Pan3.fcs to empty cdf slot...
write Compensation Controls_Alexa 647 Stained Control.fcs to empty cdf slot...
write Compensation Controls_Alexa 700 Stained Control.fcs to empty cdf slot...
write Compensation Controls_FITC Stained Control.fcs to empty cdf slot...
write Compensation Controls_Pacific Blue Stained Control.fcs to empty cdf slot...
write Compensation Controls_Pacific Orange Stained Control.fcs to empty cdf slot...
write Compensation Controls_PE-Cy7 Stained Control.fcs to empty cdf slot...
write Compensation Controls_PE Stained Control.fcs to empty cdf slot...
write Compensation Controls_PerCP-Cy5-5 Stained Control.fcs to empty cdf slot...
write Compensation Controls_Unstained Control.fcs to empty cdf slot...
write Panel 1 - pERK pp38_Pan1.fcs to empty cdf slot...
write Panel 2 - pSTAT1 pSTAT6_Pan2.fcs to empty cdf slot...
done!
............done!
intact cells
singlets
paco é
paco 1
paco 0
750hi
unstim2
lps
pma
750 lo
il4
ifn
unstim1
............done!
Error in colnames(ce) : colnames are not consistent across samples!
In addition: Warning message:
In .local(object, ...) : markers are not consistent across samples!

Whereas the direct FCS/gatingML import seems to work just fine:

> xmlfile <-ce$gatingML
> fcsFiles <- list.files(ce$fcsdir, full.names = TRUE)
> gs <- cytobank_to_gatingset(xmlfile, fcsFiles)
intact cells
singlets
750 lo
750hi
paco 0
paco 1
paco é
unstim1
ifn
il4
pma
lps
unstim2
All FCS files have the same following channels:
FSC-A
FSC-H
FSC-W
SSC-A
SSC-H
SSC-W
FITC-A
PE-A
PerCP-Cy5-5-A
PE-Cy7-A
Pacific Blue-A
Pacific Orange-A
Alexa 647-A
Alexa 700-A
Time
write Barcoding Only_Pan3.fcs to empty cdf slot...
write Compensation Controls_Alexa 647 Stained Control.fcs to empty cdf slot...
write Compensation Controls_Alexa 700 Stained Control.fcs to empty cdf slot...
write Compensation Controls_FITC Stained Control.fcs to empty cdf slot...
write Compensation Controls_Pacific Blue Stained Control.fcs to empty cdf slot...
write Compensation Controls_Pacific Orange Stained Control.fcs to empty cdf slot...
write Compensation Controls_PE-Cy7 Stained Control.fcs to empty cdf slot...
write Compensation Controls_PE Stained Control.fcs to empty cdf slot...
write Compensation Controls_PerCP-Cy5-5 Stained Control.fcs to empty cdf slot...
write Compensation Controls_Unstained Control.fcs to empty cdf slot...
write Panel 1 - pERK pp38_Pan1.fcs to empty cdf slot...
write Panel 2 - pSTAT1 pSTAT6_Pan2.fcs to empty cdf slot...
done!
............done!
intact cells
singlets
paco é
paco 1
paco 0
750hi
unstim2
lps
pma
750 lo
il4
ifn
unstim1
............done!

Interestingly, I think LongNames are appropriately assigned for each FCS file...

> markernames(gs)
[[1]]
character(0)

[[2]]
[1] "p-p38-Ax488"    "CD3-PE"         "CD20-PerCPCy55" "CD33-PECy7"     "CD4-PacBlue"    "PacO BC"        "p-ERK-Ax647"   
[8] "Ax750 BC"      

[[3]]
[1] "Ax488"     "PE"        "PerCPCy55" "PECy7"     "BV"        "PacO BC"   "Ax647"     "Ax750 BC" 

[[4]]
[1] "p-STAT1-Ax488"  "CD3-PE"         "CD20-PerCPCy55" "CD33-PECy7"     "CD4-PacBlue"    "PacO BC"        "p-STAT6-Ax647" 
[8] "Ax750 BC"      

Warning message:
In .local(object, ...) :
  marker names are not consistent across samples within flowSet

I'm not sure what the best solution to this is, as you wouldn't want to facet datasets with different markers, but gating should still work just fine as the gates are defined by shortNames. A few solutions I thought of are:

The FCS/gatingML import solution should work fine for now, but I thought I'd bring it to your attention.

Best, -Ben

Link to the ACS file.


> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] CytoML_1.12.1         cytotidyr_0.0.1.100   forcats_0.5.0         stringr_1.4.0         dplyr_0.8.5          
 [6] purrr_0.3.3           readr_1.3.1           tidyr_1.0.2           tibble_3.0.0          ggplot2_3.3.0        
[11] tidyverse_1.3.0       debarcoder_0.0.0.9000 flowCore_1.52.1      

loaded via a namespace (and not attached):
  [1] colorspace_1.4-1     ellipsis_0.3.0       class_7.3-15         mclust_5.4.5         corpcor_1.6.9       
  [6] base64enc_0.1-3      fs_1.4.0             clue_0.3-57          rstudioapi_0.11      farver_2.0.3        
 [11] hexbin_1.28.1        IDPmisc_1.1.20       earth_5.1.2          fansi_0.4.1          mvtnorm_1.1-0       
 [16] lubridate_1.7.4      xml2_1.3.0           splines_3.6.2        R.methodsS3_1.8.0    mnormt_1.5-6        
 [21] robustbase_0.93-6    mixsmsn_1.1-6        knitr_1.28           Formula_1.2-3        jsonlite_1.6.1      
 [26] CytobankAPI_1.3.0    broom_0.5.5          cluster_2.1.0        dbplyr_1.4.2         png_0.1-7           
 [31] R.oo_1.23.0          graph_1.64.0         shiny_1.4.0.2        rrcov_1.5-2          compiler_3.6.2      
 [36] httr_1.4.1           backports_1.1.5      fastmap_1.0.1        Matrix_1.2-18        assertthat_0.2.1    
 [41] cli_2.0.2            later_1.0.0          htmltools_0.4.0      tools_3.6.2          ncdfFlow_2.32.0     
 [46] gtable_0.3.0         glue_1.3.2           flowWorkspace_3.34.1 reshape2_1.4.3       ggcyto_1.14.1       
 [51] Rcpp_1.0.4           Biobase_2.46.0       cellranger_1.1.0     vctrs_0.2.4          nlme_3.1-142        
 [56] xfun_0.12            rvest_0.3.5          mime_0.9             lifecycle_0.2.0      gtools_3.8.2        
 [61] XML_3.99-0.3         DEoptimR_1.0-8       zlibbioc_1.32.0      MASS_7.3-51.4        scales_1.1.0        
 [66] promises_1.1.0       hms_0.5.3            parallel_3.6.2       RBGL_1.62.1          RColorBrewer_1.1-2  
 [71] curl_4.3             yaml_2.2.1           gridExtra_2.3        TeachingDemos_2.10   latticeExtra_0.6-29 
 [76] stringi_1.4.6        pcaPP_1.9-73         plotrix_3.7-7        flowClust_3.24.0     e1071_1.7-3         
 [81] BiocGenerics_0.32.0  flowViz_1.50.0       rlang_0.4.5          pkgconfig_2.0.3      matrixStats_0.56.0  
 [86] fda_2.4.8.1          lattice_0.20-38      labeling_0.3         ks_1.11.7            tidyselect_1.0.0    
 [91] plyr_1.8.6           magrittr_1.5         R6_2.4.1             generics_0.0.2       DBI_1.1.0           
 [96] pillar_1.4.3         haven_2.2.0          withr_2.1.2          sn_1.6-1             janitor_1.2.1       
[101] modelr_0.1.6         crayon_1.3.4         KernSmooth_2.23-16   ellipse_0.4.1        jpeg_0.1-8.1        
[106] grid_3.6.2           readxl_1.3.1         data.table_1.12.8    Rgraphviz_2.30.0     plotmo_3.5.6        
[111] reprex_0.3.0         digest_0.6.25        classInt_0.4-2       xtable_1.8-4         httpuv_1.5.2        
[116] numDeriv_2016.8-1.1  R.utils_2.9.2        flowStats_3.44.0     RcppParallel_5.0.0   stats4_3.6.2        
[121] munsell_0.5.0        openCyto_1.24.0  
jacobpwagner commented 4 years ago

Hey @bjreisman , @mikejiang actually added support for handling multiple panels via an extra panel_id arg to cytobank_to_gatingset shortly after the last Bioconductor release (which it looks like is the version of CytoML you have). See this issue and this commit. Through testing with this dataset, I found a few more minor subsetting issues which should be fixed after this commit.

Now I can parse all 3 panels just fine, just keep in mind that you need to specify panel_id = 1 or panel_id =2 in the gatingset_to_cytobank call (it will default to 1). This is essentially your second solution ("Perhaps only allow importing one panel at a time if there's more than one?").

You can either pull those commits and rebuild CytoML, or wait 2 weeks for the next Bioconductor release. Bioconductor 3.11 will force R 4.0, however, so keep that in mind.

bjreisman commented 4 years ago

Thanks for highlighting that fix! I've upgraded to R 4.0.0 and the latest version of bioconductor and indeed it does now work as expected! :+1:

jacobpwagner commented 4 years ago

Good to hear. Glad to help.

bjreisman commented 3 years ago

Hi CytoML team, I'm bumping this older issue as I'm having trouble reading in cytobank experiments with multiple panels again. Specifically, it's failing at this step:

> ce_get_transformations(ce)
Error in colnames(x) : colnames are not consistent across samples!

which reflects failing at this step:

> colnames(ce)
Error in colnames(ce) : colnames are not consistent across samples!
gfinak commented 3 years ago

Thanks Benjamin. Unfortunately Jake moved on to other things recently but we'll try to address this as soon as we can. Is the data you shared earlier still suitable for testing? Or do we need a different data set to reproduce this? Greg

bjreisman commented 3 years ago

Hi Greg, the dataset from the original issue still appears to work, which may make this a 'new' issue. I've attached an experiment that reproduces the problem. I think the first example had the same channels (short names) with different names (long names?) assigned on the different panels. In this case, the different panels actually have different channels (short names?) Hope that helps! cytoml_87.zip

mikejiang commented 3 years ago

Looks like the original model of cytobankExperiment class as a homogeneous data container doesn't fit the use cases you are running into. I've added optional panel_name argument to the relevant ce APIs so that we can restrict the range of query when there is discrepancy across panels, e.g.

>  ce <- open_cytobank_experiment(acsfile)
Unpacking ACS file...
>   pn <- "Panel 2"
> ce_get_channels(ce)
Error in ce_get_channels(ce) : 
  colnames are not consistent across samples!
> ce_get_channels(ce, pn)
 [1] "Time"              "FSC-A"             "FSC-H"             "FSC-W"             "SSC-A"            
 [6] "SSC-H"             "SSC-W"             "Alexa Fluor 647-A" "Alexa Fluor 700-A" "Alexa Fluor 488-A"
[11] "Pacific Blue-A"    "Pacific Orange-A"  "PE-A"              "APC-H7-A"          "row"              
[16] "col" 

Now the importing cytobank experiment should work

>   gs <- cytobank_to_gatingset(ce, panel_id = 2)
intact
singles
barcoded
barcoded-dim
intact
singles
barcoded-dim
barcoded
done!