Closed thijsdejong10 closed 4 months ago
It looks like line 555 in chemstation.py should be num_times = struct.unpack("<H", f.read(2))[0]
instead of num_times = struct.unpack("<I", f.read(4))[0]
. This matches with the description of the file structure given in the documentation where the data type should be a little-endian short.
Hi! Thanks for looking into this! If you're confident you've identified the problem, would you please add a test, correct the code, and make a pull request? We can review it next week. Thanks!
Hi,
When reading some Agilent Masshunter proprietary .D folders, some result in error: unpack requires a buffer of 4 bytes
described above while some result in error: unpack requires a buffer of 2 bytes
as shown below. These folders all come from GC-MS.
There is no issue when reading .D folders coming from GC-FID.
error Traceback (most recent call last)
Cell In[26], line 3
1 sample_folder = 'raw_data.D'
----> 3 datadir = rb.read(data_path + sample_folder)
4 print(datadir)
File c:\users\user\deploy\venv_gcms\lib\site-packages\rainbow\__init__.py:50, in read(path, prec, hrms, requested_files)
48 ext = os.path.splitext(path)[1]
49 if ext.upper() == '.D':
---> 50 datadir = agilent.read(path, prec, hrms, requested_files)
51 elif ext.lower() == '.raw':
52 datadir = waters.read(path, prec, requested_files)
File c:\users\user\deploy\venv_gcms\lib\site-packages\rainbow\agilent\__init__.py:22, in read(path, prec, hrms, requested_files)
8 """
9 Reads an Agilent .D directory.
10
(...)
19
20 """
21 datafiles = []
---> 22 datafiles.extend(chemstation.parse_allfiles(path, prec, requested_files))
23 if hrms:
24 try:
File c:\users\user\deploy\venv_gcms\lib\site-packages\rainbow\agilent\chemstation.py:36, in parse_allfiles(path, prec, requested_files)
34 if requested_files and name.lower() not in requested_files:
35 continue
---> 36 datafile = parse_file(os.path.join(path, name), prec)
37 if datafile:
38 datafiles.append(datafile)
File c:\users\user\deploy\venv_gcms\lib\site-packages\rainbow\agilent\chemstation.py:62, in parse_file(path, prec)
60 return parse_uv(path)
61 elif ext == '.ms':
---> 62 return parse_ms(path, prec)
63 return None
File c:\users\user\deploy\venv_gcms\lib\site-packages\rainbow\agilent\chemstation.py:623, in parse_ms(path, prec)
621 times[i] = int_unpack(f.read(4))[0]
622 f.read(6)
--> 623 pair_counts[i] = short_unpack(f.read(2))[0]
624 f.read(4)
625 pair_bytes = f.read(pair_counts[i] * 4)
error: unpack requires a buffer of 2 bytes
It looks like you are right. For GC MS files, the num_times should be a short. I have updated the repo and uploaded a new package to PyPi. Please let me know if it fixes your problems.
@thijsdejong10 Do you mind if we add your data files as a test in the repo?
Nice (I got it working but didn't have the time to properly validate it yet). No problem to add the file to the test repo.
Hello, I am trying to use the package but am getting a "struct.error: unpack requires a buffer of 4 bytes" error in /rainbow/agilent/chemstation.py line 568. I expect something is going wrong in determining the right num_times earlier in the script but I am not sure. Any help appreciated :) An example .D file is here