3 byte fields - Githubissues

eyurtsev / fcsparser

A python parser for reading fcs files supporting FCS 2.0, 3.0, 3.1

MIT License

74 stars 45 forks source link

3 byte fields #32

Closed bpteague closed 3 years ago

bpteague commented 3 years ago

This PR supersedes #17 -- sorry it's taken me so long to get back to it! I've addressed the style, variable name and documentation issues you had, and I've added a unit test as well.

From the original PR: Some FCS files (such as the one generated by the Cytek xP5) store integers in 3-byte fields. This breaks numpy's parser, which only wants power-of-two sized fields. So, I've updated fromfile() to parse FCS files as a table of 1-byte unsigned ints, expand those fields to 4 bytes, and then re-view them as the proper dtype.

eyurtsev commented 3 years ago

Sorry for the delay -- taking a look :`)

eyurtsev commented 3 years ago

Verified that tests are passing w/ additional test

Verified no substantial impact on performance

Confirmed no effect on parsing performance for a larger fcs file using a 30MB file -- although file only contained a single dtype (so one wouldn't expect much of a difference).

For a smaller file w/ mixed dtypes:

from master: 1.22 ms ± 7.91 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) from branch: 1.26 ms ± 3.88 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

eyurtsev commented 3 years ago

Added unit-test coverage for the data segment portion here: https://github.com/eyurtsev/fcsparser/pull/34

eyurtsev commented 3 years ago

Release on pypi as version 0.2.3. Thanks for the PR!