RGLab / CytoML

A GatingML Interface for Cross Platform Cytometry Data Sharing
GNU Affero General Public License v3.0
29 stars 14 forks source link

Updating channel names in compensation matrices, transformations, and gates when passing channel_alias argument in parseWorkspace #61

Open juyeongkim opened 5 years ago

juyeongkim commented 5 years ago

When I pass channel_alias argument in parseWorkspace, I expect that it will update the channel names in compensation matrices, transformations, and gates; however, it's only changing the channel names in flowData I think.

For example,

> ws <- CytoML::openWorkspace("/shared/silo_researcher/Gottardo_R/jkim2345_working/SDY820/data/All TB mem samples centralised New gating-1.605759.wsp")
> ws
FlowJo Workspace Version  20.0 
File location:  /shared/silo_researcher/Gottardo_R/jkim2345_working/SDY820/data 
File name:  All TB mem samples centralised New gating-1.605759.wsp 
Workspace is open. 

Groups in Workspace
Name Num.Samples
1 All Samples          60
2 CD4 group 1          40
3 CD4 group 2          20
> map <- data.frame(
+     alias = c("APC-eFluor 780-A", "eFluor 450-A"),
+     channels = c("APC-eFluor 780-A, APC-eFluor780-A", "eFluor 450-A, eFluor450-A"),
+     stringsAsFactors = FALSE
+ )
> gs <- CytoML::parseWorkspace(ws, name = 1, channel_alias = map)
...
loading data: /share/files/Studies/SDY820/@files/rawdata/flow_cytometry/Specimen_001_TU0002_006.605696.fcs
Compensating
Error in FUN(newX[, i], ...) : 
  channels mismatched between compensation and flow data!

The only difference between these two groups is the channel names, and I'd like to merge them into one gating set by passing a channel alias map when parsing the workspace; however, we get an error because the channel names are not updated in the compensation matrix.

Alternatively (in more traditional way of handling this issue), we can parse the workspace into two gating sets by group, change the channel names in one group to match the other, and create a gating set list, and merge them into a gating set.

> gs1 <- CytoML::parseWorkspace(ws, name = 2)
> gs2 <- CytoML::parseWorkspace(ws, name = 3)
> gs2 <- flowWorkspace::updateChannels(gs2, map = data.frame(
  old = c("APC-eFluor780-A", "eFluor450-A"),
  new = c("APC-eFluor 780-A", "eFluor 450-A"),
  stringsAsFactors = FALSE
))
> gsl <-flowWorkspace::GatingSetList(list(gs1, gs2))
> gs_merged <- rbind2(gsl)
A GatingSet with 60 samples
> colnames(gs_merged)
 [1] "FSC-A"                  "FSC-H"                  "FSC-W"                  "SSC-A"                  "SSC-H"                 
 [6] "SSC-W"                  "Comp-APC-A"             "Comp-Alexa Fluor 700-A" "Comp-APC-eFluor 780-A"  "Comp-V500-A"           
[11] "Comp-FITC-A"            "Comp-PE-A"              "Comp-PerCP-Cy5-5-A"     "Comp-PE-Cy7-A"          "Comp-eFluor 450-A"     
[16] "Comp-BV650-A"           "Time"        

The latter solution is fine, but allowing parseWorkspace to update channel names in compensation matrices, transformation, and gates when channel_alias is more user friendly and expected behavior to the users (or just me).

I understand that what I proposed might require big changes to c level code (passing aliases and updating channel names on the fly). The easier solution might be to update channel names in workspace is to pre-process (search and replace channel names) the file and pass the pre-processed file to parseWorkspace. It could be a part of CytoML or another utility package.

mikejiang commented 5 years ago

@juyeongkim Thank you for the detailed report of the issue and the good workarounds provided!

Yes, channel_alias was handled at flowCore::read.FCS level to address the discrepancies among fcs files that presumably belong to the same group in flowJo. To solve the inter-group discrepancies, I prefer to your post-parsing merging of gss or pre-parsing cleanup to xml file, which are really QC steps that'd better separate from parser itself.