eyurtsev / fcsparser

A python parser for reading fcs files supporting FCS 2.0, 3.0, 3.1
MIT License
74 stars 45 forks source link

Bugfix: Handle repeated TEXT delimiter #18

Closed cericson closed 5 years ago

cericson commented 5 years ago

First, thanks for creating and maintaining this package! I use it a lot.

I discovered a bug in how the TEXT segment is parsed. A repeated delimiter is supposed to represent a single delimiter in the decoded strings (see item 3.2.7 of the 3.1 standard). For example, the key-value pair {'flow_speed': '3 m/s'} with the delimiter / should be encoded as: /flow_speed/3 m//s/. This string is currently decoded in fcsparser as: {'flow_speed': '3 m', '': 's'}

The code in this PR should handle the delimiter correctly. It's not quite ready to merge yet, though. It appears that the test FCS file with duplicate column names actually just had the delimiter in the names, causing them to be read incorrectly. This means the corresponding test is now failing. If there are other real examples of FCS files with duplicate column names, we should swap one out. If not, perhaps the test can be deleted. What do you think?

A couple notes: 1) The same delimiter logic is described in the FCS 2.0 standard, so this change theoretically shouldn't cause any compatibility issues 2) I moved some of the surrounding code into a function just to make it more easily testable - the actual addition is only a few lines.

eyurtsev commented 5 years ago

Thanks will review during the weekend :)

cericson commented 5 years ago

Have you found any other examples of FCS files with duplicate feature names?

cericson commented 5 years ago

Unless you object, I'll delete the failing test since its assumptions don't appear to be valid any more. The FCS spec ($PnN subsection of 3.2.20) forbids duplicate parameter names.