Open photocyte opened 4 years ago
Hello,
I had the same issue, thank you for this solution! I am using the cytoflow package for my parsing and analysis of FC data and I wanted to raise the issue with them as well. Do you mind if I use your example?
@maaikesangster Please feel free
Do note that fcsparser
supports choosing which dataset in the file to parse out. You can use the data_set
keyword argument to the FCSParser
constructor. It's 0-indexed -- so data_set = 0
is the first data set, data_set = 1
is the second, etc.
(And @maaikesangster , cytoflow
exposes the same functionality in ImportOp
)
For me, for a file with 4x concatenated FCS files, this works for data_set=0
and data_set=1
, but for data_set=2
& data_set=3
, it fails:
meta , data = fcsparser.parse(f,data_set=2)
Encountered an illegal utf-8 byte in the header.
Illegal utf-8 characters will be ignored.
'utf-8' codec can't decode byte 0x8c in position 0: invalid start byte
20220112_files/ADM_12JAN2022_112816.VIA.FCS
All 4 files can be opened successfully when first separated via this approach (https://github.com/eyurtsev/fcsparser/issues/24#issue-697426922) . Happy to share all 5 files (original + 4 split) if desired.
edit: here is the full error message
~/miniconda3/lib/python3.9/site-packages/fcsparser/api.py in parse(path, meta_data_only, compensate, channel_naming, reformat_meta, data_set, dtype)
538 read_data = not meta_data_only
539
--> 540 fcs_parser = FCSParser(path, read_data=read_data, channel_naming=channel_naming,
541 data_set=data_set)
542
~/miniconda3/lib/python3.9/site-packages/fcsparser/api.py in __init__(self, path, read_data, channel_naming, data_set)
105 if path:
106 with open(path, 'rb') as f:
--> 107 self.load_file(f, data_set=data_set, read_data=read_data)
108
109 def load_file(self, file_handle, data_set=0, read_data=True):
~/miniconda3/lib/python3.9/site-packages/fcsparser/api.py in load_file(self, file_handle, data_set, read_data)
117 while data_segments <= data_set:
118 self.read_header(file_handle, nextdata_offset)
--> 119 self.read_text(file_handle)
120 if '$NEXTDATA' in self.annotation:
121 data_segments += 1
~/miniconda3/lib/python3.9/site-packages/fcsparser/api.py in read_text(self, file_handle)
215 #####
216 # Parse the TEXT segment of the FCS file into a python dictionary
--> 217 delimiter = raw_text[0]
218
219 if raw_text[-1] != delimiter:
IndexError: string index out of range
It seems data_set
is looking to split on the string $NEXTDATA
, whereas the example FCS file I've uploaded are just whole separate files that are concatenated, so they are instead separated by the FCS start bytes FCS3.0
.
@photocyte I'd love to add it to my collection of weird FCS files (: And if I can figure out the fix, I'll submit a pull request to @eyurtsev .
Thanks @bpteague ! See linked zip file below. That has the _1
,_2
,_3
,_4
split off FCS files, plus the original FCS file ADM_12JAN2022_112816.VIA.FCS
.
I also realized I previously uploaded a file here (https://github.com/eyurtsev/fcsparser/issues/24#issue-697426922) that should have the same phenomena, but maybe it isn't already split out.
@photocyte Thanks for the file. I found the problem, and the fix is easy. In fcsparser.api
, on line 125, replace
nextdata_offset = self.annotation['$NEXTDATA']
with
nextdata_offset += self.annotation['$NEXTDATA']
@eyurtsev, I'll put together a test case and a PR.
Hi there,
I've come across FCS files (From the Luminex Muse), which implement multi-FCS by simple concatenating single FCS files together. This was my solution to split them:
Once these multi-FCS files are split, fcsparser works perfectly, as far as I can tell. But it might be nice for the library to be able to detect these files by default! See attached for an example FCS: ADM_09SEP2020_181310.VIA.FCS.zip