RGLab / CytoML

A GatingML Interface for Cross Platform Cytometry Data Sharing
GNU Affero General Public License v3.0
30 stars 14 forks source link

CytoML: Cross-Platform Cytometry Data Sharing.

This package is designed to import/export the hierarchical gated cytometry data to and from R (specifically the openCyto framework) using the gatingML2.0 and FCS3.0 cytometry data standards. This package makes use of the GatingSet R object and data model so that imported data can easily be manipulated and visualized in R using tools like openCyto and ggCyto.

What problems does CytoML solve?

CytoML allows you to:

Reporting Bugs or Issues

Otherwise, we may close your issue without responding.

INSTALLATION

CytoML can be installed in several ways:

For all versions:

For all versions, you must have dependencies installed

library(BiocManager)
# This should pull all dependencies.
BiocManager::install("openCyto") 

# Then install latest dependencies from github, using devtools.
install.packages("devtools") 
library(devtools) #load it

install_github("RGLab/flowWorkspace")
install_github("RGLab/openCyto")

Installing from BioConductor.

library(BiocManager)
#this should pull all dependencies.
BiocManager::install("CytoML", version = "devel") 
library(BiocManager)
#this should pull all dependencies.
BiocManager::install("CytoML", version = "devel") 

Installing from GitHub

install.packges("devtools")
devtools::install_github("RGLab/CytoML")
install.packges("devtools")
devtools::install_github("RGLab/CytoML@*release")

Reproducible examples from the CytoML paper

# We recomend using R version 3.5.0
devtools::install_github("RGLab/RProtoBufLib@v1.3.7")
devtools::install_github("RGLab/cytolib@v1.3.2")
devtools::install_github("RGLab/flowCore@v1.47.7")
devtools::install_github("RGLab/flowWorkspace@v3.29.7")
devtools::install_github("RGLab/openCyto@v1.19.2")
devtools::install_github("RGLab/CytoML@v1.7.10")
devtools::install_github("RGLab/ggcyto@v1.9.12")

Examples

Import data

To import data you need the xml workspace and the raw FCS files.

Import gatingML generated from Cytobank.

library(CytoML)
acsfile <- system.file("extdata/cytobank_experiment.acs", package = "CytoML")
ce <- open_cytobank_experiment(acsfile)
xmlfile <- ce$gatingML
fcsFiles <- list.files(ce$fcsdir, full.names = TRUE)
gs <- cytobank_to_gatingset(xmlfile, fcsFiles)

Import a Diva workspace.

ws <- open_diva_xml(system.file('extdata/diva/PE_2.xml', package = "flowWorkspaceData"))
# The path to the FCS files is stored in ws@path.
# It can also be passed in to parseWorksapce via the `path` argument.
gs <- diva_to_gatingset(ws, name = 2, subset = 1, swap_cols = FALSE)

Interact with the gated data (GatingSet)

We need flowWorkspace to interact with the imported data.

library(flowWorkspace)

We can visualize the gating tree as follows:

#get the first sample
gh <- gs[[1]]

#plot the hierarchy tree
plot(gh)

For more information see the flowWorkspace package.

We can print all the cell populations defined in the gating tree.

#show all the cell populations(/nodes)
gs_get_pop_paths(gh)
## [1] "root"            "/P1"             "/P1/P2"          "/P1/P2/P3"      
## [5] "/P1/P2/P3/P4"    "/P1/P2/P3/P4/P5"

We can extract the cell population statistics.

#show the population statistics
gh_pop_compare_stats(gh)
##    openCyto.freq   xml.freq openCyto.count xml.count node
## 1:    1.00000000 1.00000000          19090     19090 root
## 2:    0.93609219 0.93776847          17870     17902   P1
## 3:    0.97991046 0.97994637          17511     17543   P2
## 4:    0.70327223 0.70307245          12315     12334   P3
## 5:    0.09378806 0.09404897           1155      1160   P4
## 6:    0.95151515 0.94827586           1099      1100   P5

The openCyto.count column shows the cell counts computed via the import. The xml.count column shows the cell counts computed by FlowJo (note not all platforms report cell counts in the workspace). It is normal for these to differ by a few cells due to numerical differences in the implementation of data transformations. CytoML and openCyto are reproducing the data analysis from the raw data based on the information in the workspace.

We can plot all the gates defined in the workspace.

#plot the gates
plotGate(gh) 

Access information about cells in a specific population.

Because CytoML and flowWorkspace reproduce the entire analysis in a workspace in R, we have access to information about which cells are part of which cell popualtions.

flowWorkspace has convenience methods to extract the cells from specific cell populations:

gh_pop_get_data(gh,"P3")
## flowFrame object '9a1897d7-ebc9-4077-aa34-6d9e1367fa67'
## with 12315 cells and 15 observables:
##                   name desc  range  minRange maxRange
## $P1               Time <NA> 262144 0.0000000 262144.0
## $P2              FSC-A <NA> 262144 0.0000000 262144.0
## $P3              FSC-H <NA> 262144 0.0000000 262144.0
## $P4              FSC-W <NA> 262144 0.0000000 262144.0
## $P5              SSC-A <NA> 262144 0.0000000 262144.0
## $P6              SSC-H <NA> 262144 0.0000000 262144.0
## $P7              SSC-W <NA> 262144 0.0000000 262144.0
## $P8             FITC-A <NA> 262144 0.1516347      4.5
## $P9               PE-A  CD3 262144 0.2953046      4.5
## $P10     PerCP-Cy5-5-A <NA> 262144 0.4697134      4.5
## $P11          PE-Cy7-A <NA> 262144 0.5638024      4.5
## $P12             APC-A  bob 262144 0.7838544      4.5
## $P13         APC-Cy7-A Viab 262144 0.6886181      4.5
## $P14 Bd Horizon V450-A CD44 262144 0.6413334      4.5
## $P15  Pacific Orange-A  CD8 262144 0.3376040      4.5
## 231 keywords are stored in the 'description' slot

This returns a flowFrame with the cells in gate P3 (70% of the cells according to the plot).

The matrix of expression can be extracted from a flowFrame using the exprs() method from the flowCore package:

library(flowCore)
e <- exprs(gh_pop_get_data(gh,"P3"))
class(e)
## [1] "matrix"
dim(e)
## [1] 12315    15
colnames(e)
##  [1] "Time"              "FSC-A"             "FSC-H"            
##  [4] "FSC-W"             "SSC-A"             "SSC-H"            
##  [7] "SSC-W"             "FITC-A"            "PE-A"             
## [10] "PerCP-Cy5-5-A"     "PE-Cy7-A"          "APC-A"            
## [13] "APC-Cy7-A"         "Bd Horizon V450-A" "Pacific Orange-A"
#compute the MFI of the fluorescence channels.
colMeans(e[,8:15])
##            FITC-A              PE-A     PerCP-Cy5-5-A          PE-Cy7-A 
##         0.8305630         1.3162132         0.7743459         0.8017827 
##             APC-A         APC-Cy7-A Bd Horizon V450-A  Pacific Orange-A 
##         1.0482663         1.1636818         2.2960554         1.3684453

Export gated data to other platforms.

In order to export gated data, it must be in GatingSet format.

Export a GatingSet from R to Cytobank or FlowJo

Load something to export.

dataDir <- system.file("extdata",package="flowWorkspaceData")
gs <- load_gs(list.files(dataDir, pattern = "gs_manual",full = TRUE))
## loading R object...

## loading tree object...

## Done
Export to Cytobank
#Cytobank
outFile <- tempfile(fileext = ".xml")
gatingset_to_cytobank(gs, outFile)
## Warning in gatingset_to_cytobank(gs, outFile): With
## 'cytobank.default.scale' set to 'TRUE', data and gates will be re-
## transformed with cytobank's default scaling settings, which may affect how
## gates look like.

## [1] "/tmp/RtmpV1ZasG/file4b9f7e4e25c1.xml"
Export to FlowJo
#flowJo
outFile <- tempfile(fileext = ".wsp")
gatingset_to_flowjo(gs, outFile)
## [1] "/tmp/RtmpV1ZasG/file4b9f18da0869.wsp"

Next Steps

See the flowWorskspace and openCyto packages to learn more about what can be done with GatingSet objects.

Code of conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.