RGLab / flowCore

Core flow cytometry infrastructure
43 stars 25 forks source link

transformation and some suggestions #104

Closed joe-jhou2 closed 6 years ago

joe-jhou2 commented 6 years ago

I'm not sure whether it's a legitimate feature, but from my side, I feel some gating failures may due to incorrected or improperly transformation. So far, the bi-transformation is based on one sample that picked up arbitrarily. For some data sets, especially longitudinal studies, the first sample and last sample maybe various, i.e. machine voltage, compensation et al, even if we've tried best to keep the consistency, but those unexpected changes may jump up sometimes. So, my concern, applying the transformation from one sample to spread to all samples, is appropriate or not? @mikejiang @gfinak

The opencyto is awesome package, but I have to say, the tutorials have to be improved if you want to more people without any knowledge about R and coding to utilize this tool in their work. I suggest to assume the users are zero background, and teach them step by step and from very beginning in order to make sure everybody would be able to establish their own pipeline without frequently request basis questions and assistance from your side. One more, the tutorials and workshop info spread out everywhere, would you be able to deposit and consolidate all those useful tutorials et al in ONE website? @raphg

gfinak commented 6 years ago

@mimisikai Using a single transformation is appropriate if users hope to compare data across samples. OpenCyto has always been put out there with the disclaimer that users need well standardized data. In the absence of that, it's the users job to deal with it. A single gating set is deliberately designed to be homogeneous. There are ways around your issue. The data can be batched and processed (i.e. transformed differently), but then you can't compare the MFI of cell populations across batches. There are some channel normalization tools out there (warpSet) that we and others have published about. Alternately different templates (with slightly different parameters) for different batches of data are another way to handle this problem while maintaining the same transformation parameters. That said, openCyto does per-sample data-driven gating, so it should be pretty robust to drifts in the MFI over time (unlike using a single template gate for all samples).

I agree with you that we need to put an effort on developing updated workflows and documentation. The core tools have been in flux recently so we haven't put enough effort there. But, the tools are not targeted at people who have no R or programming experience. They are a programmatic API for building computational cytometry workflows, targeted primarily at bioinformaticians. We do have plans for simplified tools (see for example https://github.com/RGLab/opencytoCL a command line interface that's focused on doing atomic manipulations of cytometry data sets).

The primary source for documentation are the package vignettes for the different opencyto components.

Greg

mikejiang commented 6 years ago

Still debating within myself about the idea of sample-specific transformation parameters. Technically it is fairly straightforward: we can extend existing s3 method estimateLogicle.GatingHierarchy introduced by #206 with estimateLogicle.GatingSet that iterates through samples to do the sample-by-sample logicle parameter estimation and then return a named list of transformerList, which can be applied by transform(gs, lostOfTrans) and stored at R level (transformation slot). Seemingly it does provide the flexibility of the infrastructure so that transformation can be optionally adjusted at sample level. However , as @gfinak said, data won't be fully comparable across samples (especially MFI). Imagine such GatingSet is archived and delivered to another user without the prior warning about the issues of different scales within the data set, confusion is almost certain to occur down the road.

gfinak commented 6 years ago

One possibility is to update the print and summary methods for a GatingSet, to output something like:

When the same transformation parameters are used: "MFIs are comparable across samples." or When different transformation parameters are used: "Uses a heterogeneous transformation. MFIs are not comparable across samples."

mikejiang commented 6 years ago

@mimisikai , I ended up adding support for this feature by allowing passing a list of transformerList to transform(gs, trans) call. Check out the latest flowWorkspace, ncdfFlow and flowCore from github trunk

gfinak commented 6 years ago

Can this be closed?

mikejiang commented 6 years ago

sure