RGLab / flowWorkspace

flowWorkspace
GNU Affero General Public License v3.0
45 stars 21 forks source link

Additional construction/coercion methods for cytoframe #348

Open jacobpwagner opened 4 years ago

jacobpwagner commented 4 years ago

First: It would be good to support R-level construction of cytoframe objects just taking a matrix as is possible for flowFrame. For example:

Works

fr <- flowFrame(matrix(1:4, nrow = 1, ncol = 4, dimnames = list(NULL, c("A","B","C","D"))))

Not yet supported:

cf <- cytoframe(matrix(1:4, nrow = 1, ncol = 4, dimnames = list(NULL, c("A","B","C","D"))))

The current workaround is to construct the flowFrame and then convert it to cytoframe using flowFrame_to_cytoframe, but that is a little inefficient as it goes through an I/O intermediate. Even if that is the quick solution we go with under the hood, the usage of cytoframe shouldn't require the user falling back to a flowFrame.

Second: It would be good to make a cytoset_to_cytoframe method along the lines of the flowCore method for coercing a flowSet to a single flowFrame by doing some consistency checks before binding event rows into a single matrix. This pooling of events is often performed before running dimension reduction or clustering algorithms. You can see the need for this in other packages where users have made their own functions, like flowSOM::AggregateFlowFrames. @DillonHammill also gave a good summary of how it would be helpful. This will require some extra care to do consistency checks properly for the resulting cytoframe and we should discuss how to handle any potential extra/missing channels. Perhaps those consistency checks should just be handled before this by cytoqc and if cytoset_to_cytoframe detects an inconsistency should just stop and direct users to cytoqc.

The current workaround here is a chain of coercions (cytoset->flowSet->flowFrame->cytoframe), which is inefficient and a lot to ask of users to hack together each time.

DillonHammill commented 4 years ago

@jacobpwagner, sorry for bugging you again. I have been experiencing some issues with current coercion workaround, I seem to be getting very different results depending on how many cytoframes are in the cytoset.

The first issue occurs when you try to coerce a transformed cytoset (containing a single cytoframe) to a cytoframe. The coercion from cytoset to flowFrame works as expected, but when you try to convert the flowFrame to a cytoframe with flowFrame_to_cytoframe() the parameter ranges change. At first I thought this was due to linearisation introduced by load_cytoframe_from_fcs() but it persists even when setting transformation = FALSE as below:

# require latest CytoExploreRData v1.0.3
devtools::install_github("DillonHammill/CytoExploreRData")
library(CytoExploreRData)
library(flowWorkspace)

# load compensated/transformed/gated GatingSet
gs <- load_gs(system.file("extdata/Activation-GatingSet", package = "CytoExploreRData"),
              backend_readonly = FALSE)

# extract root cytoset
cs <- gs_pop_get_data(gs, "root")

# coerce cytoset containing first cytoframe to flowFrame
fr <- as(cs[1], "flowFrame")
range(fr, type = "instrument")

     FSC-A  FSC-H  FSC-W  SSC-A  SSC-H  SSC-W Alexa Fluor 488-A      PE-A
min      0      0      0      0      0      0         0.3440274 0.5447587
max 262143 262143 262143 262143 262143 262143         4.5000000 4.5000000
    PE-Texas Red-A   7-AAD-A  PE-Cy7-A Alexa Fluor 405-A Alexa Fluor 430-A
min      0.7766721 0.7676063 0.7410693         0.2184961        -0.1468131
max      4.5000000 4.5000000 4.5000000         4.5000000         4.5000000
    Qdot 605-A Alexa Fluor 647-A Alexa Fluor 700-A APC-Cy7-A   Time
min  0.3265916         0.3060549         0.5694999 0.5042382      0
max  4.5000000         4.5000000         4.5000000 4.5000000 262143

# coerce flowFrame to cytoframe
cf <- flowFrame_to_cytoframe(fr)
range(cf, type = "instrument")

     FSC-A  FSC-H  FSC-W  SSC-A  SSC-H  SSC-W Alexa Fluor 488-A
min      0      0      0      0      0      0        -0.9284211
max 262143 262143 262143 262143 262143 262143    262143.0000000
              PE-A PE-Texas Red-A        7-AAD-A      PE-Cy7-A
min     -0.2008877     -0.4126023     -0.2468764     -1.286891
max 262143.0000000 262143.0000000 262143.0000000 262143.000000
    Alexa Fluor 405-A Alexa Fluor 430-A     Qdot 605-A Alexa Fluor 647-A
min         -1.011764         -1.249842     -0.7027928        -0.1020277
max     262143.000000     262143.000000 262143.0000000    262143.0000000
    Alexa Fluor 700-A    APC-Cy7-A   Time
min         -1.193219     -0.74819      0
max     262143.000000 262143.00000 262143

# try to prevent linearisation
cf <- flowFrame_to_cytoframe(fr, transformation = FALSE)
range(cf, type = "instrument")

     FSC-A  FSC-H  FSC-W  SSC-A  SSC-H  SSC-W Alexa Fluor 488-A
min      0      0      0      0      0      0        -0.9284211
max 262143 262143 262143 262143 262143 262143    262143.0000000
              PE-A PE-Texas Red-A        7-AAD-A      PE-Cy7-A
min     -0.2008877     -0.4126023     -0.2468764     -1.286891
max 262143.0000000 262143.0000000 262143.0000000 262143.000000
    Alexa Fluor 405-A Alexa Fluor 430-A     Qdot 605-A Alexa Fluor 647-A
min         -1.011764         -1.249842     -0.7027928        -0.1020277
max     262143.000000     262143.000000 262143.0000000    262143.0000000
    Alexa Fluor 700-A    APC-Cy7-A   Time
min         -1.193219     -0.74819      0
max     262143.000000 262143.00000 262143

Interestingly, the same is not observed when you try to coerce a cytoset containing multiple cytoframes to a cytoframe. The data remains on the transformed scale, however the parameter ranges do change:

# coerce cytoset containing some cytoframes to flowFrame
fr <- as(cs[1:4], "flowFrame")
range(fr, type = "instrument")

     FSC-A  FSC-H  FSC-W  SSC-A  SSC-H  SSC-W Alexa Fluor 488-A      PE-A
min      0      0      0      0      0      0         0.3440274 0.5447587
max 262143 262143 262143 262143 262143 262143         4.5000000 4.5000000
    PE-Texas Red-A   7-AAD-A  PE-Cy7-A Alexa Fluor 405-A Alexa Fluor 430-A
min      0.7766721 0.7676063 0.7410693         0.2184961        -0.1468131
max      4.5000000 4.5000000 4.5000000         4.5000000         4.5000000
    Qdot 605-A Alexa Fluor 647-A Alexa Fluor 700-A APC-Cy7-A   Time
min  0.3265916         0.3060549         0.5694999 0.5042382      0
max  4.5000000         4.5000000         4.5000000 4.5000000 262143

# coerce flowFrame to cytoframe
cf <- flowFrame_to_cytoframe(fr)
range(cf, type = "instrument")

     FSC-A  FSC-H  FSC-W     SSC-A  SSC-H  SSC-W Alexa Fluor 488-A
min      0      0      0    -35.64      0      0        -0.9284211
max 262143 258475 229935 262143.00 257573 257523         5.0000000
          PE-A PE-Texas Red-A    7-AAD-A  PE-Cy7-A Alexa Fluor 405-A
min -0.2224025     -0.6973049 -0.3512947 -1.548259         -1.011764
max  5.0000000      5.0000000  5.0000000  5.000000          5.000000
    Alexa Fluor 430-A Qdot 605-A Alexa Fluor 647-A Alexa Fluor 700-A
min         -1.249842  -1.485213        -0.2153718         -1.193219
max          5.000000   5.000000         5.0000000          5.000000
    APC-Cy7-A Time
min  -0.74819    0
max   5.00000 6568

I know that you working on this at the moment, but is there a way that I can get consistent results with the current workaround?

jacobpwagner commented 4 years ago

Thanks @DillonHammill . It does appear that something is wrong in handling the annotated parameter ranges, particularly during flowFrame->cytoframe coercion. I'm looking in to it:

> library(flowCore)
> library(flowWorkspace)
> library(CytoExploreRData)
> 
> gs <- load_gs(system.file("extdata/Activation-GatingSet", package = "CytoExploreRData"),
+               backend_readonly = FALSE)
> 
> cf <- gh_pop_get_data(gs[[1]])
> 
> range(cf, "data")
       FSC-A  FSC-H    FSC-W     SSC-A  SSC-H     SSC-W Alexa Fluor 488-A       PE-A PE-Texas Red-A    7-AAD-A  PE-Cy7-A Alexa Fluor 405-A Alexa Fluor 430-A Qdot 605-A
min   3840.9   5008  48962.7    812.43    992  48231.07        -0.9284211 -0.2008877     -0.4126023 -0.2468764 -1.286891         -1.011764         -1.249842 -0.7027928
max 262143.0 258475 213015.5 262143.00 257528 240257.84         4.4974089  4.4947143      4.1895137  4.4887066  4.344767          4.499980          4.499548  4.4908552
    Alexa Fluor 647-A Alexa Fluor 700-A APC-Cy7-A   Time
min        -0.1020277         -1.193219 -0.748190   54.7
max         4.4999361          4.408533  4.378348 6312.3
> range(cf)
     FSC-A  FSC-H  FSC-W  SSC-A  SSC-H  SSC-W Alexa Fluor 488-A      PE-A PE-Texas Red-A   7-AAD-A  PE-Cy7-A Alexa Fluor 405-A Alexa Fluor 430-A Qdot 605-A
min      0      0      0      0      0      0         0.3440274 0.5447587      0.7766721 0.7676063 0.7410693         0.2184961        -0.1468131  0.3265916
max 262143 262143 262143 262143 262143 262143         4.5000000 4.5000000      4.5000000 4.5000000 4.5000000         4.5000000         4.5000000  4.5000000
    Alexa Fluor 647-A Alexa Fluor 700-A APC-Cy7-A   Time
min         0.3060549         0.5694999 0.5042382      0
max         4.5000000         4.5000000 4.5000000 262143
> parameters(cf)$maxRange
 [1] 262143.0 262143.0 262143.0 262143.0 262143.0 262143.0      4.5      4.5      4.5      4.5      4.5      4.5      4.5      4.5      4.5      4.5      4.5 262143.0
> 
> fr <- cytoframe_to_flowFrame(cf)
> range(fr, "data")
       FSC-A  FSC-H    FSC-W     SSC-A  SSC-H     SSC-W Alexa Fluor 488-A       PE-A PE-Texas Red-A    7-AAD-A  PE-Cy7-A Alexa Fluor 405-A Alexa Fluor 430-A Qdot 605-A
min   3840.9   5008  48962.7    812.43    992  48231.07        -0.9284211 -0.2008877     -0.4126023 -0.2468764 -1.286891         -1.011764         -1.249842 -0.7027928
max 262143.0 258475 213015.5 262143.00 257528 240257.84         4.4974089  4.4947143      4.1895137  4.4887066  4.344767          4.499980          4.499548  4.4908552
    Alexa Fluor 647-A Alexa Fluor 700-A APC-Cy7-A   Time
min        -0.1020277         -1.193219 -0.748190   54.7
max         4.4999361          4.408533  4.378348 6312.3
> range(fr)
     FSC-A  FSC-H  FSC-W  SSC-A  SSC-H  SSC-W Alexa Fluor 488-A      PE-A PE-Texas Red-A   7-AAD-A  PE-Cy7-A Alexa Fluor 405-A Alexa Fluor 430-A Qdot 605-A
min      0      0      0      0      0      0         0.3440274 0.5447587      0.7766721 0.7676063 0.7410693         0.2184961        -0.1468131  0.3265916
max 262143 262143 262143 262143 262143 262143         4.5000000 4.5000000      4.5000000 4.5000000 4.5000000         4.5000000         4.5000000  4.5000000
    Alexa Fluor 647-A Alexa Fluor 700-A APC-Cy7-A   Time
min         0.3060549         0.5694999 0.5042382      0
max         4.5000000         4.5000000 4.5000000 262143
> parameters(fr)$maxRange
 [1] 262143.0 262143.0 262143.0 262143.0 262143.0 262143.0      4.5      4.5      4.5      4.5      4.5      4.5      4.5      4.5      4.5      4.5      4.5 262143.0
> 
> cf_back <- flowFrame_to_cytoframe(fr)
> range(cf_back, "data")
       FSC-A  FSC-H    FSC-W     SSC-A  SSC-H     SSC-W Alexa Fluor 488-A       PE-A PE-Texas Red-A    7-AAD-A  PE-Cy7-A Alexa Fluor 405-A Alexa Fluor 430-A Qdot 605-A
min   3840.9   5008  48962.7    812.43    992  48231.07        -0.9284211 -0.2008877     -0.4126023 -0.2468764 -1.286891         -1.011764         -1.249842 -0.7027928
max 262143.0 258475 213015.5 262143.00 257528 240257.84         4.4974089  4.4947143      4.1895137  4.4887066  4.344767          4.499980          4.499548  4.4908552
    Alexa Fluor 647-A Alexa Fluor 700-A APC-Cy7-A   Time
min        -0.1020277         -1.193219 -0.748190   54.7
max         4.4999361          4.408533  4.378348 6312.3
> range(cf_back)
     FSC-A  FSC-H  FSC-W  SSC-A  SSC-H  SSC-W Alexa Fluor 488-A          PE-A PE-Texas Red-A       7-AAD-A      PE-Cy7-A Alexa Fluor 405-A Alexa Fluor 430-A    Qdot 605-A
min      0      0      0      0      0      0     -9.284211e-01 -2.008877e-01  -4.126023e-01 -2.468764e-01     -1.286891         -1.011764         -1.249842 -7.027928e-01
max 262143 262143 262143 262143 262143 262143      2.621430e+05  2.621430e+05   2.621430e+05  2.621430e+05 262143.000000     262143.000000     262143.000000  2.621430e+05
    Alexa Fluor 647-A Alexa Fluor 700-A    APC-Cy7-A   Time
min     -1.020277e-01         -1.193219     -0.74819      0
max      2.621430e+05     262143.000000 262143.00000 262143
> parameters(cf_back)$maxRange
 [1] 262143 262143 262143 262143 262143 262143 262143 262143 262143 262143 262143 262143 262143 262143 262143 262143 262143 262143
jacobpwagner commented 4 years ago

It also appears to be a problem particular to reading back in via load_cytoframe_from_fcs. flowFrame_to_cytoframe involves simply writing the flowFrame out to an FCS and reading back in to a cytoframe: https://github.com/RGLab/flowWorkspace/blob/4a6883af4d9671edee9b10bca4f56113e5598a12/R/cytoframe.R#L726-L730

But if we follow that some logic but read back in to another flowFrame using read.FCS, the parameter ranges are more intact:

> tmp <- tempfile()
> write.FCS(fr, tmp)
[1] "/tmp/RtmpNTwMiE/file7a5a3b2b1991"
> fr_back <- read.FCS(tmp)
> range(fr_back, "data")
       FSC-A  FSC-H    FSC-W     SSC-A  SSC-H     SSC-W Alexa Fluor 488-A       PE-A PE-Texas Red-A    7-AAD-A  PE-Cy7-A Alexa Fluor 405-A Alexa Fluor 430-A
min   3840.9   5008  48962.7    812.43    992  48231.07        -0.9284211 -0.2008877     -0.4126023 -0.2468764 -1.286891         -1.011764         -1.249842
max 262143.0 258475 213015.5 262143.00 257528 240257.84         4.4974089  4.4947143      4.1895137  4.4887066  4.344767          4.499980          4.499548
    Qdot 605-A Alexa Fluor 647-A Alexa Fluor 700-A APC-Cy7-A   Time
min -0.7027928        -0.1020277         -1.193219 -0.748190   54.7
max  4.4908552         4.4999361          4.408533  4.378348 6312.3
> range(fr_back)
     FSC-A  FSC-H  FSC-W  SSC-A  SSC-H  SSC-W Alexa Fluor 488-A       PE-A PE-Texas Red-A    7-AAD-A  PE-Cy7-A Alexa Fluor 405-A Alexa Fluor 430-A Qdot 605-A
min      0      0      0      0      0      0        -0.9284211 -0.2008877     -0.4126023 -0.2468764 -1.286891         -1.011764         -1.249842 -0.7027928
max 262143 262143 262143 262143 262143 262143         4.5000000  4.5000000      4.5000000  4.5000000  4.500000          4.500000          4.500000  4.5000000
    Alexa Fluor 647-A Alexa Fluor 700-A APC-Cy7-A   Time
min        -0.1020277         -1.193219  -0.74819      0
max         4.5000000          4.500000   4.50000 262143
> parameters(fr_back)$maxRange
 [1] 262143.0 262143.0 262143.0 262143.0 262143.0 262143.0      4.5      4.5      4.5      4.5      4.5      4.5      4.5      4.5      4.5      4.5      4.5
[18] 262143.0
jacobpwagner commented 4 years ago

Regarding the range discrepancy between flowCore::read.FCS and flowWorkspace::load_cytoframe_from_fcs, I believe it is due to a difference in how each utilizes the flowCore_PnRmax values to set the max of the parameter range.

For flowCore::read.FCS, it looks like the preference will be for flowCore_PnRMax and only if it is not present will it then default to PnR for the channel: https://github.com/RGLab/flowCore/blob/10e68d2a343b39a5a484bbc2c4eae7fbef0e2c57/R/IO.R#L469-L476

Whereas for flowWorkspace::load_cytoframe_from_fcs-->-->MemCytoFrame::read_fcs, it will only prefer flowCore_PnRmax if the "transformation" keyword is "custom": https://github.com/RGLab/cytolib/blob/5afbec8cabdf012b08f19857d440ed139f0c5d82/src/MemCytoFrame.cpp#L928-L938

This is similar to the logic flowCore::read.FCS uses when reading the data: https://github.com/RGLab/flowCore/blob/10e68d2a343b39a5a484bbc2c4eae7fbef0e2c57/R/IO.R#L1028-L1034

@mikejiang , if you have time, a second set of eyes here could be good to determine what the appropriate behavior should be to synchronize both parsers to. It also underscores the need to switch over to having read.FCS just wrap the cytolib parser so we can't run in to parsing logic discrepancies.