MIT-LCP / wfdb-python

Native Python WFDB package
MIT License
750 stars 303 forks source link

Change in type promotion. Fixes to annotation.py #506

Closed tompollard closed 1 month ago

tompollard commented 1 month ago

As discussed in https://github.com/MIT-LCP/wfdb-python/issues/493, numpy v2.0 introduced changes to type promotion rules: https://numpy.org/devdocs/numpy_2_0_migration_guide.html#changes-to-numpy-data-type-promotion

Running pytest with numpy==2.0.2 and NPY_PROMOTION_STATE=weak_and_warn raises the following warnings for wfdb/io/annotation.py:

  /Users/tompollard/projects/wfdb-python/wfdb/io/annotation.py:2222: UserWarning: result dtype changed due to the removal of value-based promotion from NumPy. Changed from int64 to uint8.
    while filebytes[bpi, 1] >> 2 == 59:

  /Users/tompollard/projects/wfdb-python/wfdb/io/annotation.py:2239: UserWarning: result dtype changed due to the removal of value-based promotion from NumPy. Changed from int64 to uint8.
    label_store = filebytes[bpi, 1] >> 2

tests/test_plot.py::TestPlotInternal::test_get_plot_dims
  /Users/tompollard/projects/wfdb-python/wfdb/io/annotation.py:2240: UserWarning: result dtype changed due to the removal of value-based promotion from NumPy. Changed from int64 to uint8.
    sample_diff += int(filebytes[bpi, 0] + 256 * (filebytes[bpi, 1] & 3))

The changes in this pull request address these issues by explicitly casting the type. I plan to follow up with several additional fixes to other modules.

bemoody commented 1 month ago

Thanks for working on this!

filebytes = filebytes.astype(np.int64)

Here, the original filebytes is an array of uint8. You're copying and converting them into an array of int64, which means multiplying memory usage by a factor of 9.

So I think this would probably work, but I think (since we're looping over the bytes in Python anyway) it'd be better to do the conversion one element at a time as needed.

That is to say, changing something like filebytes[bpi, 1] >> 2 to either int(filebytes[bpi, 1]) >> 2 if what we really want is a python integer, or np.int64(filebytes[bpi, 1]) >> 2 if what we want is a numpy integer.

tompollard commented 1 month ago

Thanks! Like this? I don't see any particular reason to use int64 on the while loop, so went for int.

bemoody commented 1 month ago

Thanks. Unless I missed something, though, it looks like there's still an issue in line 2239.

bemoody commented 1 month ago

In line 2241, I think we want np.int64(filebytes[bpi, 1]) & 3 instead of np.int64(filebytes[bpi, 1] & 3).

In addition to the ones you fixed, there's one more warning in line 2327 (aux_notelen & 1).

I'm not sure why not all of the warnings were showing up before.

tompollard commented 1 month ago

In line 2241, I think we want np.int64(filebytes[bpi, 1]) & 3 instead of np.int64(filebytes[bpi, 1] & 3).

Sorry, good catch.

In addition to the ones you fixed, there's one more warning in line 2327 (aux_notelen & 1).

Odd, I see this now too:

tests/test_annotation.py: 10 warnings
tests/test_plot.py: 2 warnings
  /Users/tompollard/projects/wfdb-python/wfdb/io/annotation.py:2327: UserWarning: result dtype changed due to the removal of value-based promotion from NumPy. Changed from int64 to uint8.
    if aux_notelen & 1:

I have cast to int64

tompollard commented 1 month ago

In line 2241, I think we want np.int64(filebytes[bpi, 1]) & 3 instead of np.int64(filebytes[bpi, 1] & 3).

Adding just np.int64(filebytes[bpi, 1]) & 3 causes the multiplication to precede the bitwise & operator, so I added an extra set of parentheses to maintain the order.

bemoody commented 1 month ago

I have cast to int64

Sorry, this (e7290629) does not fix it. "aux_notelen" is causing the warning. "aux_notebytes" is not a problem and there is no reason to change it.

bemoody commented 1 month ago

There is also something strange happening involving pandas in rdann

$ NPY_PROMOTION_STATE=legacy ./ve.np2/bin/python -c 'import wfdb, numpy, pandas; print(numpy.__version__, pandas.__version__); wfdb.rdann("sample-data/1003", "atr")'
2.1.2 2.2.3
$ NPY_PROMOTION_STATE=weak ./ve.np2/bin/python -c 'import wfdb, numpy, pandas; print(numpy.__version__, pandas.__version__); wfdb.rdann("sample-data/1003", "atr")'
2.1.2 2.2.3
$ NPY_PROMOTION_STATE=legacy ./ve.np1/bin/python -c 'import wfdb, numpy, pandas; print(numpy.__version__, pandas.__version__); wfdb.rdann("sample-data/1003", "atr")'
1.26.4 2.2.3
$ NPY_PROMOTION_STATE=weak ./ve.np1/bin/python -c 'import wfdb, numpy, pandas; print(numpy.__version__, pandas.__version__); wfdb.rdann("sample-data/1003", "atr")'
1.26.4 2.2.3
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/wfdb-python/wfdb/io/annotation.py", line 2019, in rdann
    annotation.set_label_elements(return_label_elements)
  File "/tmp/wfdb-python/wfdb/io/annotation.py", line 1405, in set_label_elements
    self.convert_label_attribute(contained_elements[0], e)
  File "/tmp/wfdb-python/wfdb/io/annotation.py", line 1468, in convert_label_attribute
    label_map = self.create_label_map(inplace=False)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/wfdb-python/wfdb/io/annotation.py", line 908, in create_label_map
    label_map.loc[i] = self.custom_labels.loc[i]
    ~~~~~~~~~~~~~^^^
  File "/tmp/wfdb-python/ve.np1/lib/python3.11/site-packages/pandas/core/indexing.py", line 911, in __setitem__
    iloc._setitem_with_indexer(indexer, value, self.name)
  File "/tmp/wfdb-python/ve.np1/lib/python3.11/site-packages/pandas/core/indexing.py", line 1932, in _setitem_with_indexer
    self._setitem_with_indexer_missing(indexer, value)
  File "/tmp/wfdb-python/ve.np1/lib/python3.11/site-packages/pandas/core/indexing.py", line 2328, in _setitem_with_indexer_missing
    self.obj._mgr = self.obj._append(value)._mgr
                    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/wfdb-python/ve.np1/lib/python3.11/site-packages/pandas/core/frame.py", line 10554, in _append
    other = row_df.infer_objects(copy=False).rename_axis(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/wfdb-python/ve.np1/lib/python3.11/site-packages/pandas/core/generic.py", line 6888, in infer_objects
    new_mgr = self._mgr.convert(copy=copy)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/wfdb-python/ve.np1/lib/python3.11/site-packages/pandas/core/internals/managers.py", line 447, in convert
    return self.apply("convert", copy=copy, using_cow=using_copy_on_write())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/wfdb-python/ve.np1/lib/python3.11/site-packages/pandas/core/internals/managers.py", line 363, in apply
    applied = getattr(b, f)(**kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/wfdb-python/ve.np1/lib/python3.11/site-packages/pandas/core/internals/blocks.py", line 639, in convert
    blocks = self.split_and_operate(
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/wfdb-python/ve.np1/lib/python3.11/site-packages/pandas/core/internals/blocks.py", line 471, in split_and_operate
    rbs = func(nb, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/wfdb-python/ve.np1/lib/python3.11/site-packages/pandas/core/internals/blocks.py", line 655, in convert
    res_values = lib.maybe_convert_objects(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "lib.pyx", line 2602, in pandas._libs.lib.maybe_convert_objects
OverflowError: Python int too large to convert to C long
bemoody commented 1 month ago

Looks like a straight up pandas bug:

$ NPY_PROMOTION_STATE=weak ./ve.np1/bin/python -c 'import numpy,pandas; print(numpy.__version__, pandas.__version__); print(pandas.DataFrame({"x":[1]}))'
1.26.4 2.2.3
...
OverflowError: Python int too large to convert to C long
bemoody commented 1 month ago

https://github.com/pandas-dev/pandas/issues/60023

tompollard commented 1 month ago

Thanks Benjamin. Interested in seeing what happens with the Pandas issue!