RGLab / flowStats

flowStats: algorithms for flow cytometry data analysis using BioConductor tools
15 stars 10 forks source link

gaussNorm fails on subset of a larger flowset #20

Closed Andrew-InTheBox closed 7 years ago

Andrew-InTheBox commented 7 years ago

I'm able to run gaussNorm on a flowset of 54 files specifying only the flowset and channel.names args, but if I run it on elements 1:5 of the same set to test conditions, it fails with this error, even with using the minimal set of args:

Error in extract.base.landmarks(lms$filter, channel.names, max.lms) :

Not sure whether this is a bug or not, but it's difficult to troubleshoot and hard to imagine how it will fail on a subset of files where it runs successfully on the full set. All files have 10,000 cell events.

I can provide additional details or test files if needed, these files are from an image cytometer (Imagestream), but they seem to follow fcs file format specifications and I've typically not had trouble in other packages, for example using flowCore or flowClust to operate on them.

mikejiang commented 7 years ago

Could you provide the link to the data and reproducible code?

Andrew-InTheBox commented 7 years ago

FCS files and code to reproduce error can be found here:

https://drive.google.com/open?id=0B1ppj6LgpY6sUTNvSDlHTXdZQTQ

Thanks!

mikejiang commented 7 years ago

Here I reproduced your error

> fs_scaled1 <- gaussNorm(smallset, channel.names=mycolnames) 
No curve with  2  landmarks was found for channel  Intensity_AdaptiveErode.M01..BF..80._DHR . Decrease max.lms[ Intensity_AdaptiveErode.M01..BF..80._DHR ] and try again.
Error in extract.base.landmarks(lms$filter, channel.names, max.lms) : 

As it signals, something is wrong with the channel Intensity_AdaptiveErode.M01..BF..80._DHR . And it is usually helpful to plot that channel with the stacked density plot (using ggcyto feature mentioned here)

test.chnl <- "Intensity_AdaptiveErode.M01..BF..80._DHR"
ggcyto(smallset, aes_string(x = test.chnl)) + geom_joy(aes(y = name)) + facet_null()

image This shows me that your data has not been transformed and I think normalization provided by flowStats typically does not work with raw scaled signal (@gfinak will correct me if I am wrong on this). So I would transform the data first, here I only test on one channel, and apparently you can transform multiple channels all at once

trans <- estimateLogicle(smallset[[1]], channels = test.chnl) 
smallset.trans <- transform(smallset, trans)
ggcyto(smallset.trans, aes_string(x = test.chnl)) + geom_joy(aes(y = name)) + facet_null()

image now you can see the transformed data looks easier for gaussNorm to work with, and it did succeed this time (again I am only demonstrate it on one channel)

> fs_scaled1 <- gaussNorm(smallset.trans, channel.names=test.chnl)
Adjusting the distance between landmarks
....

Now the data is normalized (it doesn't change too much but you can still see the difference after the peak alignment especially the second sample from the top)

ggcyto(fs_scaled1[["flowset"]], aes_string(x = test.chnl)) + geom_joy(aes(y = name)) + facet_null()

image

Andrew-InTheBox commented 7 years ago

Thanks Mike. You're correct, I've not been using a transformation with these files but it seems that's the problem. Thanks again for having a look at this.

Edit - I can't get the geom_joy() approach to stacking histograms to work. How/where is y = name defined for the argument to geom_joy's aes call? I don't quite follow this line in the example you linked, but it seems key and I need to subset my flowset in a manner that defines names for each individual frame within the set?

fs <- GvHD[subset(pData(GvHD), Patient %in%5:7 & Visit %in% c(5:6))[["name"]]]

Edit - I figured out a workaround. It seems geom_joy is expecting a numeric for the y argument, factoring 'name' allows it to plot and I no longer get the error Error in max(data$y) - min(data$y) : non-numeric argument to binary operator.

'p <- ggcyto(smallset, aes(x = 'Intensity_AdaptiveErode.M01..BF..80._Draq5')) p + geom_joy(aes(y = as.factor(name))) + facet_null()'