transformation functions generated by eval method for transformation objects are incompatible with 'transformList'

mikejiang commented 9 years ago

eval methods are used to parse the various transformation object and convert it to a Unary function which can be used for general-purpose transforming. So the single argument for this unary function to take should be a numeric vector, which is what transform method with transformList assumes.

However, currently these eval methods generate the function that takes a flowFrame as input, which breaks the flowCore::transform method.

For now I am going to make another version of eval so that it can be used within the openCyto framework. But for the long run, we'd like to keep just one interface,@jspidlen , do you have any idea why flowFrame must be used as input here?

jspidlen commented 9 years ago

This was done so that the compensatedParameter works. Gating-ML 2.0 includes the option of specifying "FCS" as the spillover matrix source, which means that parameters are supposed to be compensated as per compensation description in FCS. The whole flowFrame was passed along so that the spillover matrix can be extracted from the FCS keywords in those cases.

Josef

On Wed, Oct 21, 2015 at 12:30 PM, Mike Jiang notifications@github.com wrote:

eval https://github.com/RGLab/flowCore/blob/trunk/R/eval-methods.R methods are used to parse the various transformation object and convert it to a Unary function which can be used for general-purpose transforming. That said, the single argument for this unary function to take should be a numeric vector, which is what transform method with transformList assumes.

However, currently these eval methods generate the function that takes a flowFrame as input, which breaks the default transform workflow.

For now I am going to make another version of eval so that it can be used within the openCyto framework. But for the long run, we'd like to keep just one interface,@jspidlen https://github.com/jspidlen , do you have any idea why flowFrame must be used as input here?

— Reply to this email directly or view it on GitHub https://github.com/RGLab/flowCore/issues/45.

mikejiang commented 9 years ago

what is this compensatedParameter? I don't see it's been used anywhere in the entire flow tool sets. I propose to:

replace these two lines

 parameter <- resolve(expr@parameters, df)
 parameter <- flowFrameToMatrix(parameter)

to

parameter <- df

for the rest of eval methods so that flowCore::transform method will work out-of-box.

and keep eval method for compensatedParameter unchanged.

Because It just seems to me not worth to tailor the entire eval methods (for dozens of transformation types) for this single class.

thoughts, @jspidlen ?

jspidlen commented 9 years ago

1) I am not sure how much flowCore uses the compensatedParameter, but it is certainly imported by flowUtils and used with Gating-ML.

2) If I recall correctly, there was some kind of an issue with the approach you are suggesting. It's been years so the details are kind of fuzzy, but I think it was related to nesting the transformations and loosing the flowFrame if it is not passed to all the eval methods. Essentially, if you do something like logicle(compensate(log2linscale(df))), then you won't have the flowFrame when you need to extract the spillover from the keywords. Anyway, there is an easy way of testing this, just run the testGatingMLCompliance("ComplianceReport_v2.0.html", version=2.0) from flowUtils (requires the gatingMLData package) to make sure you didn't break anything and it still passes all the tests.

On Wed, Oct 21, 2015 at 1:59 PM, Mike Jiang notifications@github.com wrote:

what is this compensatedParameter? I don't see it's been used anywhere in the entire flow tool sets. I propose to:

replace these two lines

parameter <- resolve(expr@parameters, df) parameter <- flowFrameToMatrix(parameter)

to

parameter <- df

for the rest of eval methods so that flowCore::transform method will work out-of-box.

and keep eval method for compensatedParameter unchanged.

Because It just seems to me not worth to tailor the entire eval methods (for dozens of transformation types) for this single class.

thoughts, @jspidlen https://github.com/jspidlen ?

— Reply to this email directly or view it on GitHub https://github.com/RGLab/flowCore/issues/45#issuecomment-150022721.

mikejiang commented 9 years ago

compensate method must take spillover as the second argument, so I don't fully understand the issue of losing flowFrame for extracting spillover from keywords

mikejiang commented 9 years ago

And I don't think the proposed change will break anything in flowUtil since eval never gets invoked by it. But I will make sure it pass flowCore checks and tests. Thanks!

jspidlen commented 9 years ago

The compensate method is not used when reading Gating-ML. You can read-in a Gating-ML file, which contains some gates that are applicable on some FCS parameters, which may ask to be compensated based on information in the FCS file. Such Gating-ML can then be applied on different flowFrame objects and the FCS parameters will get compensated differently if different spillover matrices are in different flowFrames. Applying the Gating-ML triggers an eval chain with compensated parameters sitting potentially somewhere in the middle and needing the spillover from the flowFrame.

Anyway, it may be best if you just try and see where/if the Gating-ML unit tests break. Then you could have a look at those and maybe find a better way to design or fix things? It's been 2 years since I worked on this and I don't use R on a daily basis, so your R skills are certainly superior to mine :-).

On Wed, Oct 21, 2015 at 2:36 PM, Mike Jiang notifications@github.com wrote:

compensate method must take spillover as the second argument, so I don't fully understand the issue of losing flowFrame for extracting spillover from keywords

— Reply to this email directly or view it on GitHub https://github.com/RGLab/flowCore/issues/45#issuecomment-150031824.

mikejiang commented 9 years ago

I did find the scenario (not exactly the same as you described but somewhat similar) where eval is indirectly invoked by flowUtils, here is the call stack:

test.ScaleRange6c -> flowUtils:::performGateTest -> flowCore::filter(fcs, gate) ->  flowCore:::resolveTransforms --> flowCore:::resolveTransformReference --> flowCore::eval

where eval expect the transform function takes a flowFrame as input.

So basically a flowCore filter read from gatingML could carry transformation information which causes the data to be implicitly transformed on the fly during filter(fr, gate) call. (original data is kept and a new column is appended). And the gate coordinates are not changed in this process since they should be already stored as transformed scale in gatingML. It is just the gate parameter name is changed to that temporary column name.

It is way more complicated than I originally thought (and more than it should be IMHO). Because In our workflow we never had to deal with transforming/compensating the flow data at the gate level. The data is simply transformed and compensated once prior to any gating.

Anyway, I will have to think it over more before making any change.

jspidlen commented 9 years ago

Yes, that sounds about right... the bigger picture design of this dates back to the code written by Nishant Gopalakrishnan (he used to be at FHCRC before Raphael took all this over from Robert Gentleman). I more or less just adapted this to work with Gating-ML 2.0.

On Wed, Oct 21, 2015 at 4:43 PM, Mike Jiang notifications@github.com wrote:

I did find the scenario (not exactly the same as you described but somewhat similar) where eval is indirectly invoked by flowUtils, here is the call stack:

test.ScaleRange6c -> flowUtils:::performGateTest -> flowCore::filter(fcs, gate) -> flowCore:::resolveTransforms --> flowCore:::resolveTransformReference --> flowCore::eval

where eval expect the transform function takes a flowFrame as input.

So basically a flowCore filter read from gatingML could carry transformation information which causes the data to be implicitly transformed on the fly during filter(fr, gate) call. (original data is kept and a new column is appended). And the gate coordinates are not changed in this process since they should be already stored as transformed scale in gatingML. It is just the gate parameter name is changed to that temporary column name.

It is way more complicated than I originally thought (and more than it should be IMHO). Because In our workflow we never had to deal with transforming/compensating the flow data at the gate level. The data is simply transformed and compensated once prior to any gating.

Anyway, I will have to think it over more before making any change.

— Reply to this email directly or view it on GitHub https://github.com/RGLab/flowCore/issues/45#issuecomment-150055351.

mikejiang commented 9 years ago

I think this whole idea of performing compensation/transformation during gating process does not fit into the common flow data analysis workflow.
No matter in the current openCyto framework or the old flowCore::workFlow (now deprecated by GatingSet) , or even in flowJo, data has always been first compensated and transformed before applying any gates. And the current behavior of filter method would break either workflow by transforming the already scaled data.

And this makes me wonder whether it is necessary for gatingML to store the comp/trans information (even if just a reference) at each gate level in the first place. (At least I don't see this in flowJo, which probably has a good reason for that). It seems to me not only redundant but also create this awkward situation where the gates parsed from gatingML carries these transformation references and somehow expect filter method to take action on them.

jspidlen commented 9 years ago

Thanks Mike, Hmm... I think there are 2 separate things though: (i) implementation of the gating process and (ii) description of the gating process.

As far as implementation goes, everybody's welcome to compensate and transform FCS parameters ahead of time before gating (and it totally makes sense in terms of performance)

As far as unambiguous description of the gating process goes, you definitely need the compensation and transformation details in order to allow for reproducibility (in terms of getting the right set of events when applying a gate on an FCS file). You could argue that having compensation and transformation details (attached as a reference) for each gate is unnecessary and that one could just define the same compensation and transformation details for the whole Gating-ML file. Well, the ISAC DSTF members have been discussing the Gating-ML design over the past 9 years. For this particular aspect, it was concluded that referencing a compensation/transformation from each gate provides more flexibility and has hardly any downsides. Implementors can pre-calculate all those transformations and "add appropriate columns" to the data matrix and use those when filtering events. I believe FlowJo does something like that internally as well (I just wrote a FlowJo plugin last week and from the plugin, I had access to both, compensated and uncompensated parameters, which are normally just hidden from regular end users).

On Thu, Oct 22, 2015 at 11:00 AM, Mike Jiang notifications@github.com wrote:

I think this whole idea of performing compensation/transformation https://github.com/RGLab/flowCore/blob/trunk/R/flowFrame-accessors.R#L523-L525 during gating process does not fit into the common flow data analysis workflow.

No matter in the current openCyto framework or the old flowCore::workFlow (now deprecated by GatingSet) , data has always been first compensated and transformed before applying any gates. And the current behavior of filter method would break either workflow by transforming the already scaled data.

And this makes me wonder whether it is necessary for gatingML to store the comp/trans information (even if just a reference) at each gate level. (At least I don't see this in flowJo, which probably has a good reason for that). It seems to me not only redundant but also create this awkward situation where the gates parsed from gatingML carries these transformation references and somehow expect filter method to take action on them.

— Reply to this email directly or view it on GitHub https://github.com/RGLab/flowCore/issues/45#issuecomment-150306900.

RGLab / flowCore

transformation functions generated by eval method for transformation objects are incompatible with 'transformList' #45