Open jacobpwagner opened 4 years ago
@jacobpwagner, sorry for bugging you again. I have been experiencing some issues with current coercion workaround, I seem to be getting very different results depending on how many cytoframes
are in the cytoset
.
The first issue occurs when you try to coerce a transformed cytoset
(containing a single cytoframe) to a cytoframe
. The coercion from cytoset
to flowFrame
works as expected, but when you try to convert the flowFrame
to a cytoframe
with flowFrame_to_cytoframe()
the parameter ranges change. At first I thought this was due to linearisation introduced by load_cytoframe_from_fcs()
but it persists even when setting transformation = FALSE
as below:
# require latest CytoExploreRData v1.0.3
devtools::install_github("DillonHammill/CytoExploreRData")
library(CytoExploreRData)
library(flowWorkspace)
# load compensated/transformed/gated GatingSet
gs <- load_gs(system.file("extdata/Activation-GatingSet", package = "CytoExploreRData"),
backend_readonly = FALSE)
# extract root cytoset
cs <- gs_pop_get_data(gs, "root")
# coerce cytoset containing first cytoframe to flowFrame
fr <- as(cs[1], "flowFrame")
range(fr, type = "instrument")
FSC-A FSC-H FSC-W SSC-A SSC-H SSC-W Alexa Fluor 488-A PE-A
min 0 0 0 0 0 0 0.3440274 0.5447587
max 262143 262143 262143 262143 262143 262143 4.5000000 4.5000000
PE-Texas Red-A 7-AAD-A PE-Cy7-A Alexa Fluor 405-A Alexa Fluor 430-A
min 0.7766721 0.7676063 0.7410693 0.2184961 -0.1468131
max 4.5000000 4.5000000 4.5000000 4.5000000 4.5000000
Qdot 605-A Alexa Fluor 647-A Alexa Fluor 700-A APC-Cy7-A Time
min 0.3265916 0.3060549 0.5694999 0.5042382 0
max 4.5000000 4.5000000 4.5000000 4.5000000 262143
# coerce flowFrame to cytoframe
cf <- flowFrame_to_cytoframe(fr)
range(cf, type = "instrument")
FSC-A FSC-H FSC-W SSC-A SSC-H SSC-W Alexa Fluor 488-A
min 0 0 0 0 0 0 -0.9284211
max 262143 262143 262143 262143 262143 262143 262143.0000000
PE-A PE-Texas Red-A 7-AAD-A PE-Cy7-A
min -0.2008877 -0.4126023 -0.2468764 -1.286891
max 262143.0000000 262143.0000000 262143.0000000 262143.000000
Alexa Fluor 405-A Alexa Fluor 430-A Qdot 605-A Alexa Fluor 647-A
min -1.011764 -1.249842 -0.7027928 -0.1020277
max 262143.000000 262143.000000 262143.0000000 262143.0000000
Alexa Fluor 700-A APC-Cy7-A Time
min -1.193219 -0.74819 0
max 262143.000000 262143.00000 262143
# try to prevent linearisation
cf <- flowFrame_to_cytoframe(fr, transformation = FALSE)
range(cf, type = "instrument")
FSC-A FSC-H FSC-W SSC-A SSC-H SSC-W Alexa Fluor 488-A
min 0 0 0 0 0 0 -0.9284211
max 262143 262143 262143 262143 262143 262143 262143.0000000
PE-A PE-Texas Red-A 7-AAD-A PE-Cy7-A
min -0.2008877 -0.4126023 -0.2468764 -1.286891
max 262143.0000000 262143.0000000 262143.0000000 262143.000000
Alexa Fluor 405-A Alexa Fluor 430-A Qdot 605-A Alexa Fluor 647-A
min -1.011764 -1.249842 -0.7027928 -0.1020277
max 262143.000000 262143.000000 262143.0000000 262143.0000000
Alexa Fluor 700-A APC-Cy7-A Time
min -1.193219 -0.74819 0
max 262143.000000 262143.00000 262143
Interestingly, the same is not observed when you try to coerce a cytoset
containing multiple cytoframes
to a cytoframe
. The data remains on the transformed scale, however the parameter ranges do change:
# coerce cytoset containing some cytoframes to flowFrame
fr <- as(cs[1:4], "flowFrame")
range(fr, type = "instrument")
FSC-A FSC-H FSC-W SSC-A SSC-H SSC-W Alexa Fluor 488-A PE-A
min 0 0 0 0 0 0 0.3440274 0.5447587
max 262143 262143 262143 262143 262143 262143 4.5000000 4.5000000
PE-Texas Red-A 7-AAD-A PE-Cy7-A Alexa Fluor 405-A Alexa Fluor 430-A
min 0.7766721 0.7676063 0.7410693 0.2184961 -0.1468131
max 4.5000000 4.5000000 4.5000000 4.5000000 4.5000000
Qdot 605-A Alexa Fluor 647-A Alexa Fluor 700-A APC-Cy7-A Time
min 0.3265916 0.3060549 0.5694999 0.5042382 0
max 4.5000000 4.5000000 4.5000000 4.5000000 262143
# coerce flowFrame to cytoframe
cf <- flowFrame_to_cytoframe(fr)
range(cf, type = "instrument")
FSC-A FSC-H FSC-W SSC-A SSC-H SSC-W Alexa Fluor 488-A
min 0 0 0 -35.64 0 0 -0.9284211
max 262143 258475 229935 262143.00 257573 257523 5.0000000
PE-A PE-Texas Red-A 7-AAD-A PE-Cy7-A Alexa Fluor 405-A
min -0.2224025 -0.6973049 -0.3512947 -1.548259 -1.011764
max 5.0000000 5.0000000 5.0000000 5.000000 5.000000
Alexa Fluor 430-A Qdot 605-A Alexa Fluor 647-A Alexa Fluor 700-A
min -1.249842 -1.485213 -0.2153718 -1.193219
max 5.000000 5.000000 5.0000000 5.000000
APC-Cy7-A Time
min -0.74819 0
max 5.00000 6568
I know that you working on this at the moment, but is there a way that I can get consistent results with the current workaround?
Thanks @DillonHammill . It does appear that something is wrong in handling the annotated parameter ranges, particularly during flowFrame
->cytoframe
coercion. I'm looking in to it:
> library(flowCore)
> library(flowWorkspace)
> library(CytoExploreRData)
>
> gs <- load_gs(system.file("extdata/Activation-GatingSet", package = "CytoExploreRData"),
+ backend_readonly = FALSE)
>
> cf <- gh_pop_get_data(gs[[1]])
>
> range(cf, "data")
FSC-A FSC-H FSC-W SSC-A SSC-H SSC-W Alexa Fluor 488-A PE-A PE-Texas Red-A 7-AAD-A PE-Cy7-A Alexa Fluor 405-A Alexa Fluor 430-A Qdot 605-A
min 3840.9 5008 48962.7 812.43 992 48231.07 -0.9284211 -0.2008877 -0.4126023 -0.2468764 -1.286891 -1.011764 -1.249842 -0.7027928
max 262143.0 258475 213015.5 262143.00 257528 240257.84 4.4974089 4.4947143 4.1895137 4.4887066 4.344767 4.499980 4.499548 4.4908552
Alexa Fluor 647-A Alexa Fluor 700-A APC-Cy7-A Time
min -0.1020277 -1.193219 -0.748190 54.7
max 4.4999361 4.408533 4.378348 6312.3
> range(cf)
FSC-A FSC-H FSC-W SSC-A SSC-H SSC-W Alexa Fluor 488-A PE-A PE-Texas Red-A 7-AAD-A PE-Cy7-A Alexa Fluor 405-A Alexa Fluor 430-A Qdot 605-A
min 0 0 0 0 0 0 0.3440274 0.5447587 0.7766721 0.7676063 0.7410693 0.2184961 -0.1468131 0.3265916
max 262143 262143 262143 262143 262143 262143 4.5000000 4.5000000 4.5000000 4.5000000 4.5000000 4.5000000 4.5000000 4.5000000
Alexa Fluor 647-A Alexa Fluor 700-A APC-Cy7-A Time
min 0.3060549 0.5694999 0.5042382 0
max 4.5000000 4.5000000 4.5000000 262143
> parameters(cf)$maxRange
[1] 262143.0 262143.0 262143.0 262143.0 262143.0 262143.0 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 262143.0
>
> fr <- cytoframe_to_flowFrame(cf)
> range(fr, "data")
FSC-A FSC-H FSC-W SSC-A SSC-H SSC-W Alexa Fluor 488-A PE-A PE-Texas Red-A 7-AAD-A PE-Cy7-A Alexa Fluor 405-A Alexa Fluor 430-A Qdot 605-A
min 3840.9 5008 48962.7 812.43 992 48231.07 -0.9284211 -0.2008877 -0.4126023 -0.2468764 -1.286891 -1.011764 -1.249842 -0.7027928
max 262143.0 258475 213015.5 262143.00 257528 240257.84 4.4974089 4.4947143 4.1895137 4.4887066 4.344767 4.499980 4.499548 4.4908552
Alexa Fluor 647-A Alexa Fluor 700-A APC-Cy7-A Time
min -0.1020277 -1.193219 -0.748190 54.7
max 4.4999361 4.408533 4.378348 6312.3
> range(fr)
FSC-A FSC-H FSC-W SSC-A SSC-H SSC-W Alexa Fluor 488-A PE-A PE-Texas Red-A 7-AAD-A PE-Cy7-A Alexa Fluor 405-A Alexa Fluor 430-A Qdot 605-A
min 0 0 0 0 0 0 0.3440274 0.5447587 0.7766721 0.7676063 0.7410693 0.2184961 -0.1468131 0.3265916
max 262143 262143 262143 262143 262143 262143 4.5000000 4.5000000 4.5000000 4.5000000 4.5000000 4.5000000 4.5000000 4.5000000
Alexa Fluor 647-A Alexa Fluor 700-A APC-Cy7-A Time
min 0.3060549 0.5694999 0.5042382 0
max 4.5000000 4.5000000 4.5000000 262143
> parameters(fr)$maxRange
[1] 262143.0 262143.0 262143.0 262143.0 262143.0 262143.0 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 262143.0
>
> cf_back <- flowFrame_to_cytoframe(fr)
> range(cf_back, "data")
FSC-A FSC-H FSC-W SSC-A SSC-H SSC-W Alexa Fluor 488-A PE-A PE-Texas Red-A 7-AAD-A PE-Cy7-A Alexa Fluor 405-A Alexa Fluor 430-A Qdot 605-A
min 3840.9 5008 48962.7 812.43 992 48231.07 -0.9284211 -0.2008877 -0.4126023 -0.2468764 -1.286891 -1.011764 -1.249842 -0.7027928
max 262143.0 258475 213015.5 262143.00 257528 240257.84 4.4974089 4.4947143 4.1895137 4.4887066 4.344767 4.499980 4.499548 4.4908552
Alexa Fluor 647-A Alexa Fluor 700-A APC-Cy7-A Time
min -0.1020277 -1.193219 -0.748190 54.7
max 4.4999361 4.408533 4.378348 6312.3
> range(cf_back)
FSC-A FSC-H FSC-W SSC-A SSC-H SSC-W Alexa Fluor 488-A PE-A PE-Texas Red-A 7-AAD-A PE-Cy7-A Alexa Fluor 405-A Alexa Fluor 430-A Qdot 605-A
min 0 0 0 0 0 0 -9.284211e-01 -2.008877e-01 -4.126023e-01 -2.468764e-01 -1.286891 -1.011764 -1.249842 -7.027928e-01
max 262143 262143 262143 262143 262143 262143 2.621430e+05 2.621430e+05 2.621430e+05 2.621430e+05 262143.000000 262143.000000 262143.000000 2.621430e+05
Alexa Fluor 647-A Alexa Fluor 700-A APC-Cy7-A Time
min -1.020277e-01 -1.193219 -0.74819 0
max 2.621430e+05 262143.000000 262143.00000 262143
> parameters(cf_back)$maxRange
[1] 262143 262143 262143 262143 262143 262143 262143 262143 262143 262143 262143 262143 262143 262143 262143 262143 262143 262143
It also appears to be a problem particular to reading back in via load_cytoframe_from_fcs
. flowFrame_to_cytoframe
involves simply writing the flowFrame
out to an FCS and reading back in to a cytoframe
:
https://github.com/RGLab/flowWorkspace/blob/4a6883af4d9671edee9b10bca4f56113e5598a12/R/cytoframe.R#L726-L730
But if we follow that some logic but read back in to another flowFrame
using read.FCS
, the parameter ranges are more intact:
> tmp <- tempfile()
> write.FCS(fr, tmp)
[1] "/tmp/RtmpNTwMiE/file7a5a3b2b1991"
> fr_back <- read.FCS(tmp)
> range(fr_back, "data")
FSC-A FSC-H FSC-W SSC-A SSC-H SSC-W Alexa Fluor 488-A PE-A PE-Texas Red-A 7-AAD-A PE-Cy7-A Alexa Fluor 405-A Alexa Fluor 430-A
min 3840.9 5008 48962.7 812.43 992 48231.07 -0.9284211 -0.2008877 -0.4126023 -0.2468764 -1.286891 -1.011764 -1.249842
max 262143.0 258475 213015.5 262143.00 257528 240257.84 4.4974089 4.4947143 4.1895137 4.4887066 4.344767 4.499980 4.499548
Qdot 605-A Alexa Fluor 647-A Alexa Fluor 700-A APC-Cy7-A Time
min -0.7027928 -0.1020277 -1.193219 -0.748190 54.7
max 4.4908552 4.4999361 4.408533 4.378348 6312.3
> range(fr_back)
FSC-A FSC-H FSC-W SSC-A SSC-H SSC-W Alexa Fluor 488-A PE-A PE-Texas Red-A 7-AAD-A PE-Cy7-A Alexa Fluor 405-A Alexa Fluor 430-A Qdot 605-A
min 0 0 0 0 0 0 -0.9284211 -0.2008877 -0.4126023 -0.2468764 -1.286891 -1.011764 -1.249842 -0.7027928
max 262143 262143 262143 262143 262143 262143 4.5000000 4.5000000 4.5000000 4.5000000 4.500000 4.500000 4.500000 4.5000000
Alexa Fluor 647-A Alexa Fluor 700-A APC-Cy7-A Time
min -0.1020277 -1.193219 -0.74819 0
max 4.5000000 4.500000 4.50000 262143
> parameters(fr_back)$maxRange
[1] 262143.0 262143.0 262143.0 262143.0 262143.0 262143.0 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5
[18] 262143.0
Regarding the range discrepancy between flowCore::read.FCS
and flowWorkspace::load_cytoframe_from_fcs
, I believe it is due to a difference in how each utilizes the flowCore_PnRmax
values to set the max of the parameter range.
For flowCore::read.FCS
, it looks like the preference will be for flowCore_PnRMax
and only if it is not present will it then default to PnR
for the channel:
https://github.com/RGLab/flowCore/blob/10e68d2a343b39a5a484bbc2c4eae7fbef0e2c57/R/IO.R#L469-L476
Whereas for flowWorkspace::load_cytoframe_from_fcs-->-->MemCytoFrame::read_fcs
, it will only prefer flowCore_PnRmax
if the "transformation" keyword is "custom": https://github.com/RGLab/cytolib/blob/5afbec8cabdf012b08f19857d440ed139f0c5d82/src/MemCytoFrame.cpp#L928-L938
This is similar to the logic flowCore::read.FCS
uses when reading the data:
https://github.com/RGLab/flowCore/blob/10e68d2a343b39a5a484bbc2c4eae7fbef0e2c57/R/IO.R#L1028-L1034
@mikejiang , if you have time, a second set of eyes here could be good to determine what the appropriate behavior should be to synchronize both parsers to. It also underscores the need to switch over to having read.FCS
just wrap the cytolib
parser so we can't run in to parsing logic discrepancies.
First: It would be good to support R-level construction of
cytoframe
objects just taking a matrix as is possible forflowFrame
. For example:Works
Not yet supported:
The current workaround is to construct the
flowFrame
and then convert it tocytoframe
usingflowFrame_to_cytoframe
, but that is a little inefficient as it goes through an I/O intermediate. Even if that is the quick solution we go with under the hood, the usage ofcytoframe
shouldn't require the user falling back to aflowFrame
.Second: It would be good to make a
cytoset_to_cytoframe
method along the lines of theflowCore
method for coercing aflowSet
to a singleflowFrame
by doing some consistency checks before binding event rows into a single matrix. This pooling of events is often performed before running dimension reduction or clustering algorithms. You can see the need for this in other packages where users have made their own functions, likeflowSOM::AggregateFlowFrames
. @DillonHammill also gave a good summary of how it would be helpful. This will require some extra care to do consistency checks properly for the resultingcytoframe
and we should discuss how to handle any potential extra/missing channels. Perhaps those consistency checks should just be handled before this bycytoqc
and ifcytoset_to_cytoframe
detects an inconsistency should just stop and direct users tocytoqc
.The current workaround here is a chain of coercions (
cytoset
->flowSet
->flowFrame
->cytoframe
), which is inefficient and a lot to ask of users to hack together each time.