eyurtsev / FlowCytometryTools

A python package for visualization and analysis of high-throughput flow cytometry data
https://eyurtsev.github.io/FlowCytometryTools/
MIT License
113 stars 46 forks source link

IntervalGate bug(?) #15

Closed alonyan closed 7 years ago

alonyan commented 7 years ago

I've been using your Flow package to do some batch processing. It's awesome. Thanks!

Anyway, I think there's a small bug in the _identify function for interval gates. At least in my hands, applying an interval gate always returned an error.

fix:

I edited gates.py, line 237, to:

def _identify(self, dataframe):

    idx = (dataframe[self.channels[0]] <= self.vert[1]) & (dataframe[self.channels[0]] >= self.vert[0])

    if self.region == 'out':
        idx = ~idx

    return idx

And then it works fine.

All the best, and thanks again!

eyurtsev commented 7 years ago

This code looks identical to the current code on master in terms, so I'm not sure why you'd be seeing a difference. Would you be able to send a traceback with the error, so I could write unit test for that?

eyurtsev commented 7 years ago

Added tests for interval gate:

https://github.com/eyurtsev/FlowCytometryTools/commit/d1e2dcbbae807abdebb228e161db90a7a4df7568

All edge cases seem to be handled correctly

alonyan commented 7 years ago

Traceback (most recent call last): ` File "", line 2, in MFItableCals = Standards.gate(gates[0]).geoMeans()['PE-A']

File "FlowAnalysis/LoadFlowSamples_v2.py", line 109, in gate gated.samples = _gate(gated.samples, gate1)

File "FlowAnalysis/LoadFlowSamples_v2.py", line 129, in _gate Gatedsamples[i] = samples[i].gate(gate1)

File "", line 2, in gate

File "/Users/usr/anaconda/envs/FCSenv/lib/python2.7/site-packages/FlowCytometryTools/core/bases.py", line 105, in queueable out = fun(_args, *_kwargs)

File "/Users/usr/anaconda/envs/FCSenv/lib/python2.7/site-packages/FlowCytometryTools/core/containers.py", line 394, in gate newdata = gate(data)

File "/Users/usr/anaconda/envs/FCSenv/lib/python2.7/site-packages/FlowCytometryTools/core/gates.py", line 120, in call idx = self._identify(dataframe)

File "/Users/usr/anaconda/envs/FCSenv/lib/python2.7/site-packages/FlowCytometryTools/core/gates.py", line 241, in _identify idx1 = self.vert[0] <= dataframe[self.channels[0]]

File "/Users/usr/anaconda/envs/FCSenv/lib/python2.7/site-packages/pandas/core/ops.py", line 741, in wrapper if len(self) != len(other):

TypeError: len() of unsized object`

It does seem weird that this returns a bug, and what I wrote - which basically looks to be doing the same thing - isn't. Not sure what's going on here...

alonyan commented 7 years ago

Changing idx1 = self.vert[0] <= dataframe[self.channels[0]] to idx1 = dataframe[self.channels[0]]>= self.vert[0]

also (bizarrely) seems to solve my problem. I'm not exactly sure what's going on though since this shouldn't matter. Weird.

alonyan commented 7 years ago

And, it's a bug with pandas 0.18: multiple dispatch problem when comparing pandas Series against numpy scalars

eyurtsev commented 7 years ago

Thanks for looking into this... Applied your patch hopefully it solves the issue :)

https://github.com/eyurtsev/FlowCytometryTools/commit/a3c0b4a79d6352b8864caeda87a8340ba2f793a7