RGLab / flowWorkspace

flowWorkspace
GNU Affero General Public License v3.0
44 stars 21 forks source link

`flin` transformation is missing from `getTransformation` #198

Closed mikejiang closed 8 years ago

mikejiang commented 8 years ago

@JauntyJJS, I moved your question here as a separate issue

On 02/25/2016 09:19 PM, JauntyJJS wrote:

I was referring to the columns from the raw data from it is read using flowCore . The columns are usually channels like FSC-H and, SSC-H and so on. There are also columns called "Width" and "Time". If I am not mistaken Time usually go through a linear transformation in Flowjo. "Time" is also in the data(GvHD) as well

When I typed getTransformation on a Gating Set with data parsed in flowWorkspace. They will give the transform formulas for each channel like FSC-H and SSC-H but not for "Time"

Looking at the transformed summary data,

GT@data@frames[[fcs_file_name]]@parameters@data

I saw that the Time indeed has been transformed with a range of 455, minvalue of 154 and max value of 609. The raw data for Time should be a minvalue of 0 to a high number (like 1023)

However, getTransformation does not show this Time Transformation at all. Hence I suspect that getTransformation also give formula for columns that related to channels and may not give anything for arbitrary column names in the FCS file. I could be wrong on that,

Also I also realized that transform(GT, transList) does not update the range of the flowFrame data in GT@data@frames[[fcs_file_name]]@parameters@data. The values are still the same as the raw fcs data though the minValue and maxValue have changed due to the transformation.

mikejiang commented 8 years ago

@JauntyJJS, Can you share your example xml(and FCS) for troubleshooting? We never had to deal with linear transformation from flowJo before, so it would be good to have a good example to work with.

JauntyJJS commented 8 years ago

A01 NC.zip

Attached is an example.

Open the workspace with

wsfile="C:\Users\selvaje1\Desktop\FlowCytoStuff\A01_Workspace.wsp" (location of workspace) ws<-openWorkspace(wsfile) GT<-parseWorkspace(ws,name=1,path=c("C:\Users\selvaje1\Desktop\FlowCytoStuff\GS99003218"),isNcdf=TRUE,ignore.text.offset=TRUE,emptyValue=FALSE)

mikejiang commented 8 years ago

Time and Width channels are not used in any of gates from your workspace. It would be necessary if you could draw the gates on these channels so that I can verify the linear transformation works as it should be.

JauntyJJS commented 8 years ago

A01_Workspace.zip

Added Gates for Time and Width

mikejiang commented 8 years ago

Apparently those linear transformed gates are not parsed correctly.

> gs <- parseWorkspace(ws, name = 1, subset = 1, execute = T, emptyValue = F, ignore.text.offset = T)
Parsing 1 samples
version X
...
Warning messages:
1: In checkOffset(offsets, txt, ...) :
  The HEADER and the TEXT segment define different starting point (2958:2534) to read the data. The values in TEXT are ignored!
2: In checkOffset(offsets, txt, ...) :
  The HEADER and the TEXT segment define different ending point (5602958:5602534) to read the data. The values in TEXT are ignored!
> getPopStats(gs[[1]])
    flowCore.freq flowJo.freq flowJo.count flowCore.count               node
 1:    1.00000000  1.00000000       100000         100000               root
 2:    0.00000000  0.52588000        52588              0              Gate1
 3:    0.00000000  0.27857000        27857              0              Gate2
 4:    0.73084000  0.73016000        73016          73084        Lymphocytes
 5:    0.96227628  0.96183028        70229          70327        SingleCells
 6:    0.06508169  0.06477381         4549           4577              CD14+
 7:    0.96766441  0.96746538         4401           4429                P8+
 8:    0.04809212  0.04794365          211            213            P8-High
 9:    0.95190788  0.95205635         4190           4216             P8-Low
10:    0.00000000  0.72517000        72517              0    TestLogvsLinear
11:    0.00000000  0.66376160        48134              0 TestLinearvsLinear
12:    0.00000000  0.23480284        11302              0        Population1
13:    0.00000000  0.66397972        31960              0        Population2

Here is the standard flin that I am using based on gatingML2.0.

> trans.Time
function (x) 
{
    (x + A)/(T + A)
}
<environment: 0x813c6c0>
attr(,"type")
[1] "flin"

Before I asking FlowJo for their internal implementation of flin, I'd like to point out some other issues, which may or may not contribute to the incorrect counts:

mikejiang commented 8 years ago

@JauntyJJS , Can you also explain why you would choose linear transform on Time and Width channels? And why you preferred log to the more commonly used logicle transform? The reason I asked is that some use cases may only exist theoretically but we may never need to tackle with them if no one uses it practically.

JauntyJJS commented 8 years ago

Sure I will provide another fcs example which should not produce this warning.

I have looked into why this warning occurred. The source actually comes from how export the samples using the BD Accuri C6 Flow Cytometer. When I export using the option "Export ALL Samples as FCS", the fcs file did not have this warning. However, when I export using the option "Export ALL Samples to Third Party", the FCS file gives the warning mentioned above.

More Info in this manual https://www.bdbiosciences.com/documents/BD_Accuri_C6_Software_User_Guide.pdf

I can post the flowjo plots as well to verify the result with the new examples. Give me some time.

We actually do not use "Width" for our analysis. It is just produced by the BD Accuri C6 Flow Cytometer by default. This Width may not be even presence in other Flow Cytometrs. We did not use it at all and I guess it is all right to leave it untransformed.

As for Time, we usually use it for QC purpose to check when the cell run out if it occurs and filter accordingly by gating. We also use it for kinetic analysis related experiment. Hence, sometimes we do gate the Time data in Linear form (x-axis) and another channel in log form (y-axis), because we only want samples at a certain time range.

Below is a link to some public experiments that uses Time channel extensively http://bitesizebio.com/webinar/20605/time-the-forgotten-parameter-in-flow-cytometry/ http://jornades.uab.cat/workshopmrama/sites/jornades.uab.cat.workshopmrama/files/Assessing_water_quality_with_the_BD_Accuri_C6_flow_cytometer.pdf

For why log over logicle bixexponential transformation, I will need to ask the biologist for that. I have read up on flowjo a bit and kind of understand why it is favoured over other transforms. I also understand that flowjo uses logicle transform in its default set. I am working with a biologist who is new to flow Cytometry (so am I actually), so I don't think she may be aware of this and we may change accordingly if necessary.

Give us some time

JauntyJJS commented 8 years ago

A01.zip

Attached is a new workspace and fcs file which should not produce the warnings. The gating pictures in flowjo are given in the zip file as well for verfication.

mikejiang commented 8 years ago

Ok. The FCS warnings are gone. But I still couldn't visualize the gates. Here is the lymph gate plot from you image

And here is what ggcyto sees

library(ggcyto)
autoplot(gs[[1]], "Lymphocytes") 

image

Since the counts matches, I suspect the plot is screwed by some extreme outliers.

mikejiang commented 8 years ago

The 1d plots indicates that these outlier events at low end of SSC-A could be the reason for lymph gate not being displayed properly:

fs <- getData(gs)
autoplot(fs, "FSC-A")
autoplot(fs, "SSC-A")

image By overlaying these events on Time vs FSC plot,

ssc.low.gate  <- rectangleGate(`SSC-A` = c(-Inf, 150))
ggcyto(fs, aes(y= `FSC-A`, x = Time)) +geom_hex(alpha = 0.4) + geom_point(data = Subset(fs, ssc.low.gate))

image We can clearly see they spread through the positive side of FSC-A, which means 1d gate on FSC (shown below) won't be sufficient

nondebris.g <- rectangleGate(`FSC-A` = c(31622.8, Inf))
autoplot(fs, "FSC-A") + geom_gate(nondebris.g)

image The only way to eliminate these outliers is to create the marginal events filter directly on SSC-A prior to lymph gate

margin.filter <- boundaryFilter("SSC-A", side = "lower")
ggcyto(gs, aes(x= `FSC-A`, y = `SSC-A`), filter = margin.filter) +geom_hex() + geom_gate("Lymphocytes")

image

I guess this is also what FlowJo does behind the scene when you plot the first gate.

mikejiang commented 8 years ago

@JauntyJJS , See #199 for the latest fix for Time channel. However It is a simple time unit conversion, so you won't be able to use getTransformation to extract the transformation function for it.

JauntyJJS commented 8 years ago

Sure. I understand. I will use the first approach for now since TIMESTEP is always available in my fcs data.

Thank you for the help.

Can you close this case ?