evanyeyeye / rainbow

Read chromatography and mass spectrometry binary files.
GNU General Public License v3.0
29 stars 15 forks source link

Issue opening GC data "struct.error: unpack requires a buffer of 4 bytes" #17

Closed thijsdejong10 closed 4 months ago

thijsdejong10 commented 5 months ago

Hello, I am trying to use the package but am getting a "struct.error: unpack requires a buffer of 4 bytes" error in /rainbow/agilent/chemstation.py line 568. I expect something is going wrong in determining the right num_times earlier in the script but I am not sure. Any help appreciated :) An example .D file is here

thijsdejong10 commented 5 months ago

It looks like line 555 in chemstation.py should be num_times = struct.unpack("<H", f.read(2))[0] instead of num_times = struct.unpack("<I", f.read(4))[0]. This matches with the description of the file structure given in the documentation where the data type should be a little-endian short.

ekwan commented 5 months ago

Hi! Thanks for looking into this! If you're confident you've identified the problem, would you please add a test, correct the code, and make a pull request? We can review it next week. Thanks!

ChuaCheowHuan commented 4 months ago

Hi,

When reading some Agilent Masshunter proprietary .D folders, some result in error: unpack requires a buffer of 4 bytes described above while some result in error: unpack requires a buffer of 2 bytes as shown below. These folders all come from GC-MS.

There is no issue when reading .D folders coming from GC-FID.


error                                     Traceback (most recent call last)
Cell In[26], line 3
      1 sample_folder = 'raw_data.D'
----> 3 datadir = rb.read(data_path + sample_folder)
      4 print(datadir)

File c:\users\user\deploy\venv_gcms\lib\site-packages\rainbow\__init__.py:50, in read(path, prec, hrms, requested_files)
     48 ext = os.path.splitext(path)[1]
     49 if ext.upper() == '.D':
---> 50     datadir = agilent.read(path, prec, hrms, requested_files)
     51 elif ext.lower() == '.raw':
     52     datadir = waters.read(path, prec, requested_files)

File c:\users\user\deploy\venv_gcms\lib\site-packages\rainbow\agilent\__init__.py:22, in read(path, prec, hrms, requested_files)
      8 """
      9 Reads an Agilent .D directory. 
     10 
   (...)
     19 
     20 """
     21 datafiles = []
---> 22 datafiles.extend(chemstation.parse_allfiles(path, prec, requested_files))
     23 if hrms:
     24     try:

File c:\users\user\deploy\venv_gcms\lib\site-packages\rainbow\agilent\chemstation.py:36, in parse_allfiles(path, prec, requested_files)
     34 if requested_files and name.lower() not in requested_files:
     35     continue
---> 36 datafile = parse_file(os.path.join(path, name), prec)
     37 if datafile:
     38     datafiles.append(datafile)

File c:\users\user\deploy\venv_gcms\lib\site-packages\rainbow\agilent\chemstation.py:62, in parse_file(path, prec)
     60     return parse_uv(path)
     61 elif ext == '.ms':
---> 62     return parse_ms(path, prec)
     63 return None

File c:\users\user\deploy\venv_gcms\lib\site-packages\rainbow\agilent\chemstation.py:623, in parse_ms(path, prec)
    621 times[i] = int_unpack(f.read(4))[0]
    622 f.read(6)
--> 623 pair_counts[i] = short_unpack(f.read(2))[0]
    624 f.read(4)
    625 pair_bytes = f.read(pair_counts[i] * 4)

error: unpack requires a buffer of 2 bytes
JMarvi3 commented 4 months ago

It looks like you are right. For GC MS files, the num_times should be a short. I have updated the repo and uploaded a new package to PyPi. Please let me know if it fixes your problems.

@thijsdejong10 Do you mind if we add your data files as a test in the repo?

thijsdejong10 commented 4 months ago

Nice (I got it working but didn't have the time to properly validate it yet). No problem to add the file to the test repo.