RGLab / flowStats

flowStats: algorithms for flow cytometry data analysis using BioConductor tools
15 stars 10 forks source link

denisty1d bug? #21

Closed bc2zb closed 7 years ago

bc2zb commented 7 years ago

I've been trying to get spillover to work with the pregate = TRUE and I kept getting

Error in range(exprs(x), na.rm = TRUE)[, stain] : 
  incorrect number of dimensions 

I believe the issue is that with spillover(..., pregate = T) is that when it calls rangeGate(..., absolute = F), that calls density1d(..., absolute = F). I looked as the code for density1d and saw

if (absolute) {
        vrange <- range(x, na.rm = TRUE)[, stain] ### Called if absolute = T
        vrange[is.nan(vrange)] <- 0
        vrange[1] <- min(vrange[1], min(exprs(x)[, stain], na.rm = TRUE))
    }
    else vrange <- range(exprs(x), na.rm = TRUE)[, stain] ### Called if absolute = F

Shouldn't the absolute = F call be the same as the absolute = T call? At least, as near as I can figure, that is where the code errors out. range(exprs(x)) returns the range of all channels in x, whereas range(x) returns the range of each channel of x, and then we extract the specific channel with the subset command. If this isn't the issue, I'll go back to the drawing board.

gfinak commented 7 years ago

The doc for density1d states:

absolute: Logical controling whether to classify a population (positive
          or negative) relative to the theoretical measurment range of
          the instrument or the actual range of the data. This can be
          set to ‘TRUE’ if the alignment of the measurment range is not
          optimal and the bulk of the data is on one end of the
          theoretical range.

So range(x,na.rm=TRUE)[,stain] gives the theoretical measurement range. range(exprs(x),na.rm=TRUE)[,stain] gives the actual data range for the channel.

My guess, given your error, is that range(exprs(x), na.rm = TRUE) doesn't return a matrix so the indexing on [,stain] is where the error is thrown. Have you tried setting debug() on density1d and stepping through to see what the data look like?

bc2zb commented 7 years ago

Hey Greg, thanks for replying so promptly.

So that's my thought exactly. The indexing on [,stain] is indeed failing. When I stepped through the code, passing exprs(x) into range() returns a min and max for the entire matrix, whereas passing a flowFrame into range() returns a min and a max for each channel individually. Maybe I have some weird thing going on with my R session, but range() is in base and is a flowFrame method. I guess I'm not understanding why density1d calls range on a the exprs matrix rather than use the range method with the type set to "data".

range
Get instrument or actual data range of the flowFame. Note that instrument dynamic range is not necessarily the same as the range of the actual data values, but the theoretical range of values the measurement instrument was able to capture. The values of the dynamic range will be transformed when using the transformation methods forflowFrames.

parameters:

x: flowFrame object.

type: Range type. either "instrument" or "data". Default is "instrument"

Usage:

range(x, type = "data")
gfinak commented 7 years ago

Because range called on a flowFrame returns the instrument range for the channel, which can be different from the range of the data collected. Even if a channel has all 0 intensities, the instrument range will be fixed. So the code is correct in that it calls range on different objects depending on the parameter. The range method is overridden in flowCore for flowFrames. The question is what is returned by exprs(x, na.omit=TRUE) for your failing sample? It should be a matrix with named columns. But it seems it doesn't have more than one dimension based on the error. You'll need to check that.

bc2zb commented 7 years ago

flowSet.Rdata.zip

I've attached subSet, the flowSet I'm trying to run through the spillover function. Stepping through density1d, here's what I get:

x <- as(subSet, "flowFrame")
x

flowFrame object 'anonymous'
with 90494 cells and 13 observables:
           name           desc        range     minRange     maxRange
FSC-A     FSC-A          FSC-A 4.788725e+06 4352.5000000 4.788725e+06
SSC-A     SSC-A          SSC-A 8.388607e+06 3199.3000488 8.388607e+06
FL1-A     FL1-A          FL1-A 8.150410e+00   -5.5416135 8.150410e+00
FL2-A     FL2-A          FL2-A 9.993291e+00   -1.4054258 9.993291e+00
FL3-A     FL3-A          FL3-A 7.899356e+00   -0.8804305 7.899356e+00
FL4-A     FL4-A          FL4-A 8.653146e+00   -1.2340494 8.653146e+00
FL5-A     FL5-A          FL5-A 7.876921e+00   -1.5949830 7.876921e+00
FL6-A     FL6-A          FL6-A 9.220082e+00    0.2202158 9.220082e+00
FL7-A     FL7-A          FL7-A 1.026232e+01    0.5515996 1.026232e+01
FL8-A     FL8-A          FL8-A 9.728629e+00   -0.6028207 9.728629e+00
FL10-A   FL10-A         FL10-A 6.769483e+00   -1.4782727 6.769483e+00
FL13-A   FL13-A         FL13-A 6.541717e+00   -1.7787063 6.541717e+00
Original   <NA> Original Frame           NA    1.0000000 1.100000e+01
2 keywords are stored in the 'description' slot

range(x, na.rm = TRUE)
        FSC-A     SSC-A     FL1-A     FL2-A      FL3-A     FL4-A     FL5-A     FL6-A      FL7-A
min    4352.5    3199.3 -5.541614 -1.405426 -0.8804305 -1.234049 -1.594983 0.2202158  0.5515996
max 4788725.0 8388607.0  8.150410  9.993291  7.8993564  8.653146  7.876921 9.2200819 10.2623175
         FL8-A    FL10-A    FL13-A NA
min -0.6028207 -1.478273 -1.778706  1
max  9.7286293  6.769483  6.541717 11

range(x, na.rm = TRUE)[,"FL1-A"]
[1] -5.541614  8.150410

head(exprs(x))
       FSC-A    SSC-A    FL1-A    FL2-A      FL3-A     FL4-A      FL5-A    FL6-A    FL7-A    FL8-A
[1,] 24509.9 159910.5 6.074824 3.606561 1.03335461 1.2542714  0.4895442 2.482089 3.701122 1.793704
[2,] 24664.4 154406.6 6.064782 3.662031 0.87143933 0.5608052  0.1283143 2.527317 3.688904 1.652172
[3,] 23253.2 138234.3 6.035953 3.609594 1.38989645 0.7823721 -0.0299955 2.357702 3.678322 1.908307
[4,] 24066.0 139128.9 6.099286 3.711043 1.37795003 0.7778268 -0.3350305 2.506145 3.743391 1.920428
[5,]  5351.6  13310.1 1.999771 3.850572 0.06262572 1.1457075 -0.1862546 2.265346 3.518219 3.195550
[6,] 32744.7 216342.2 6.482014 4.075858 1.18548251 1.4325494 -0.1302977 3.297304 4.349601 2.693611
         FL10-A     FL13-A Original
[1,] -0.1170658 -0.2058759        1
[2,] -0.3570329 -0.6262542        1
[3,]  0.7253635 -0.4907320        1
[4,]  0.6011354 -0.1071283        1
[5,]  0.8623911  0.1869098        1
[6,]  0.1764168  0.2117474        1

range(exprs(x), na.rm = T)
[1] -7.237435e+00  8.388607e+06

range(exprs(x), na.rm = TRUE)[,"FL1-A"]
Error in range(exprs(x), na.rm = TRUE)[, "FL1-A"] : 
  incorrect number of dimensions

So a matrix of named columns is returned by exprs(x, na.rm = T), but range() called on a matrix as returned by exprs() will not invoke the range() method for a flowSet because exprs(x) is not a flowSet.

gfinak commented 7 years ago

I've pushed a fix. The column selection should happen after the exprs() call but before the range() call. Range returns a vector, not a matrix, so the stain selection was failing.

bc2zb commented 7 years ago

Appreciate it!

gfinak commented 7 years ago

I'll get these synced with BioC, but for now just pull from the github trunk branch