fhs / pyhdf

Python wrapper around the NCSA HDF version 4 library
https://pypi.org/project/pyhdf/
Other
46 stars 15 forks source link

Fix reads of 8-bit signed char SDSs #68

Closed greg9q closed 1 year ago

greg9q commented 1 year ago

This proposed fix addresses an issue encountered after a recent upgrade from pyhdf 0.10.5 to 0.11.3. When reading an SDS of type SDC.CHAR with version 0.11.3, I'd get the following error:

In [2]: SD("MYD03.A2023213.0000.061.2023213152719.hdf").select("Scan Type").get()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[2], line 1
----> 1 SD("MYD03.A2023213.0000.061.2023213152719.hdf").select("Scan Type").get()

File ~/Code/pyhdf/pyhdf/SD.py:1920, in SDS.get(self, start, count, stride)
   1916 if not data_type in SDC.equivNumericTypes:
   1917     raise HDF4Error('get cannot currently deal with '\
   1918                      'the SDS data type')
-> 1920 return _C._SDreaddata_0(self._id, data_type, start, count, stride)

File ~/Code/pyhdf/pyhdf/hdfext.py:333, in _SDreaddata_0(sds_id, data_type, start, edges, stride)
    332 def _SDreaddata_0(sds_id, data_type, start, edges, stride):
--> 333     return _hdfext._SDreaddata_0(sds_id, data_type, start, edges, stride)

ValueError: data type must provide an itemsize

I tracked the problem down to use of PyArray_SimpleNew with typenum NPY_STRING, which fails because that type code requires a size specification. This change instead uses PyArray_New with an itemsize of 1, which reproduces the 0.10.5 SDC.CHAR read behavior. All the non-NPY_STRING type codes are fixed-size in which case PyArray_New ignores the itemsize argument so behavior for these type codes should be unaffected.

fhs commented 1 year ago

Thanks for working on this. I guess this fixes #67? Would it be easy to write a unit test or example script that reproduces the issue?

greg9q commented 1 year ago

Yes, this addresses exactly the issue described in #67. The following quick test works in 0.10.5, fails in 0.11.3, and works with the proposed patch:

import tempfile
from pathlib import Path

import numpy as np
import numpy.testing
from pyhdf.SD import SDC, SD

def test_pyhdf_sd_read_char():
    with tempfile.TemporaryDirectory() as temp_dir:
        hdf_file = str(Path(temp_dir) / "test.hdf")
        sd = SD(hdf_file, SDC.WRITE | SDC.CREATE)
        sds = sd.create("test_sds", SDC.CHAR, [5])
        sds[:] = "ABCDE"
        np.testing.assert_equal(sds[:], np.array(list("ABCDE"), "S1"))

if __name__ == "__main__":
    test_pyhdf_sd_read_char()
fhs commented 1 year ago

Cool, can you add that as a unit test?

greg9q commented 1 year ago

Yes, done - added as a unit test.