RGLab / openCyto

A package that provides data analysis pipeline for flow cytometry.
GNU Affero General Public License v3.0
77 stars 29 forks source link

reduce the default number of vertexes of polygonGate generated by flowClust #79

Closed mikejiang closed 10 years ago

mikejiang commented 10 years ago

currently npoints(the number of points on the ellipse) used by .getEllipse is 501 by default, which slows down the gate computing quite a bit, especially for upstream gates like lymph Here gs only has one sample that contains 119531 events,

> microbenchmark(suppressMessages(recompute(gs)))
Unit: milliseconds
                             expr      min       lq   median       uq      max
 suppressMessages(recompute(gs1)) 205.1622 208.2985 209.8095 211.0655 533.7127

> getTotal(gs[[1]], "lymph")
[1] 62582

If we change npoints to 101, computing time is reduced from 200ms to 80ms, and we just have 1 event off

> microbenchmark(suppressMessages(recompute(gs)))
Unit: milliseconds
                             expr      min       lq   median       uq      max
 suppressMessages(recompute(gs1)) 76.91869 83.35187 84.46224 85.73913 430.7155
> getTotal(gs[[1]], "lymph")
[1] 62581

We may further speed it up by returning a flowCore::ellipsoidGate (defined by GatingML standard) when trans = 0.

thoughts, @gfinak ?

kevinushey commented 10 years ago

I think this is reasonable, depending on how much we want to trade off correctness for speed. The relative error in approximating a circle with an inscribed polygon will already be very close to zero, even for a number of points ~ 50.

For reference, we could compute the error in approximating a perfect circle with a polygon having n vertices with e.g. (inspired by http://www.voltage.com/blog/math-2/approximating-a-circle-with-a-polygon/)

err <- function(n) {
  1 - (n / (2*pi)) * sin(2*pi / n)
}

and checking what the error looks like,

n <- 1:100
plot(err(n) ~ n, type="l")

err

Of course, this error may in fact be less if a smarter approximation is used.

gfinak commented 10 years ago

The real solution here would be to divide the ellipse up into arcs of even length rather than stepping through a fixed angle of rotation. Then we could get away with 50 to 100 points. Problem is when the scale of the major and minor axes differs drastically (e.g. when fitting something to a scatter and fluorescence dimension), all the point are concentrated along the sides of the ellipse with the larger scale. Unfortunately finding the arc length of an ellipse involves numerically solving an elliptic integral. We could probably get away with doing that only when the scales of the axes vary by more than some factor (not often). On Apr 1, 2014 4:16 PM, "Kevin Ushey" notifications@github.com wrote:

I think this is reasonable, depending on how much we want to trade off correctness for speed. The error in approximating a circle will already be very close to zero, even for a number of points ~ 50.

For reference, we could compute the error in approximating a perfect circle with a polygon with e.g. (inspired by http://www.voltage.com/blog/math-2/approximating-a-circle-with-a-polygon/)

err <- function(n) { 1 - (n / (2_pi)) * sin(2_pi / n)}

and checking what the error looks like,

n <- 1:100 plot(err(n) ~ n, type="l")

[image: err]https://cloud.githubusercontent.com/assets/1976582/2585796/9c455a5e-b9f3-11e3-9e04-4040bcf5a92c.png

Reply to this email directly or view it on GitHubhttps://github.com/RGLab/openCyto/issues/79#issuecomment-39272318 .

mikejiang commented 10 years ago

RGLab/flowClust@5830c117718a459759e8765106a85dbb71ebf946 pushed by @gfinak passed the tests on 4 different gating template. And by using 50 points suggested by @kevinushey, we are able to reduce the recompute time by 70% with 0.2% relative error.

mikejiang commented 10 years ago

It will be more efficient to construct the ellipsoidGate from tmixFilterResult when trans == 0