RGLab / cytolib

c++ library for representing and interacting the gated cytometry data structure
GNU Affero General Public License v3.0
12 stars 11 forks source link

add c++ class for flowCore::logtGml2 #14

Closed mikejiang closed 4 years ago

mikejiang commented 5 years ago

Due to edeb097ad44c44019f681f1738930f09bfc9a45a, flowCore::logtGml2 is no longer convertible to the existing logTrans class (which is flowjo implementation).

We need to add a separate logtGml2 class to be able to store this type of transformation that was applied in R. (e.g. store the gs parsed from cytobank where logtGml2 is still pretty commonly used.

mikejiang commented 5 years ago

Here is a little more investigation on whether logtGml2 and flowjo_log are convertible to each other:

#logtGml2
x[x<0] <- min(x[x>0]) #impute negative values
log10(x/T)/M + 1

#flowjo_log
x[x<offset] <- offset #impute negative values
log10(x/offset)*scale/decade

impute negative values (first line) is not defined in GML2 standard (i.e. flog or logtGml2), right now we use min(x[x>0]), which is our arbitrary guess, but it could be really vendor specific, and it may or may not make big difference in the final gating results.

So if we ignore the difference in that line, the major log transform equations (second line) are quite comparable, when scale = 1, the only difference is extra +1 step in logtGml2, again this may or may not cause the issue depends on the transformed data scale.

So it is hard to conclude, to play safe, keep them separate (which means some extra work at c++ and pb level for adding new class), at least the current implementation of logtGml2 has been working for cytobank and diva.

jacobpwagner commented 5 years ago

Thanks for looking in to this Mike. I think that is sort of where my thinking was on it last week too, but I didn't want to jump to updating the protocol buffers if it wasn't necessary. We should also collectively decide how to properly handle the negative values, as GML2 just leaves them undefined.

mikejiang commented 5 years ago

I've tested and flowjo_log works for diva but fails on cytobank unfortunately. which leaves us no choice but adding the new c++ class.

mikejiang commented 4 years ago

addressed by 4b71c4e0d44440d7096db57bf9c56eb3bbe8acda