MIT-LCP / wfdb-python

Native Python WFDB package
MIT License
750 stars 303 forks source link

OverflowError: Python integer 256 out of bounds for uint8 #493

Open cve2022 opened 4 months ago

cve2022 commented 4 months ago

I am breaking the demos in the notebook in individual scripts to get acquainted with the package to learn how it works and adapt it to my needs.

Running demo4 as a separate script with python I get this error traceback:

D:\Users\xxx\Work\__WFDB\wfdb-python>python demo4.py
Traceback (most recent call last):
  File "D:\Users\xxx\Work\__WFDB\wfdb-python\demo4.py", line 12, in <module>
    annotation = wfdb.rdann('sample-data/100', 'atr', sampfrom=100000, sampto=110000)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Users\xxx\Work\__WFDB\wfdb-python\wfdb\io\annotation.py", line 1953, in rdann
    (sample, label_store, subtype, chan, num, aux_note) = proc_ann_bytes(
                                                          ^^^^^^^^^^^^^^^
  File "D:\Users\xxx\Work\__WFDB\wfdb-python\wfdb\io\annotation.py", line 2154, in proc_ann_bytes
    sample_diff, current_label_store, bpi = proc_core_fields(filebytes, bpi)
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Users\xxx\Work\__WFDB\wfdb-python\wfdb\io\annotation.py", line 2240, in proc_core_fields
    sample_diff += int(filebytes[bpi, 0] + 256 * (filebytes[bpi, 1] & 3))
                                           ~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
OverflowError: Python integer 256 out of bounds for uint8

The same happens with Linux (Ubuntu 22.04).

After changing the highlighted line it works like the notebook.

$ git diff
diff --git a/wfdb/io/annotation.py b/wfdb/io/annotation.py
index b398fa0..4d451b1 100644
--- a/wfdb/io/annotation.py
+++ b/wfdb/io/annotation.py
@@ -2237,7 +2237,7 @@ def proc_core_fields(filebytes, bpi):

     # Not a skip - it is the actual sample number + annotation type store value
     label_store = filebytes[bpi, 1] >> 2
-    sample_diff += int(filebytes[bpi, 0] + 256 * (filebytes[bpi, 1] & 3))
+    sample_diff += int(filebytes[bpi, 0]) + 256 * int(filebytes[bpi, 1] & 3)
     bpi = bpi + 1

     return sample_diff, label_store, bpi

EDIT: Python versions Windows: 3.12.4 Linux: 3.10.12

tompollard commented 4 months ago

It looks like the package needs fixing. I assume this is a result of the following change to numpy: https://numpy.org/devdocs/release/1.24.0-notes.html#conversion-of-out-of-bound-python-integers

bemoody commented 4 months ago

No, it's not a 1.24 issue:

>>> import numpy
>>> numpy.__version__
'1.24.2'
>>> numpy.uint8(255) + 256 * numpy.uint8(255)
65535
>>> type(_)
<class 'numpy.int64'>
>>> import numpy
>>> numpy.__version__
'2.0.0'
>>> numpy.uint8(255) + 256 * numpy.uint8(255)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python integer 256 out of bounds for uint8
bemoody commented 4 months ago

https://numpy.org/devdocs/numpy_2_0_migration_guide.html#changes-to-numpy-data-type-promotion

This is a major change & worries me there are likely to be other things invisibly broken.

For the time being, there are two workarounds:

To find problems that need fixing, you can set NPY_PROMOTION_STATE=weak_and_warn. This appears to raise a warning whenever there is an incompatible behavior change (even if the specific test case doesn't cause an overflow), which is good but it means there will likely be false positives. It will also of course have false negatives, because we don't have 100% coverage.

Here are the warnings on the current test suite:

  /tmp/wfdb-python-build.t56g2c/wfdb-python/wfdb/io/annotation.py:2222: UserWarning: result dtype changed due to the removal of value-based promotion from NumPy. Changed from int64 to uint8.
  /tmp/wfdb-python-build.t56g2c/wfdb-python/wfdb/io/annotation.py:2239: UserWarning: result dtype changed due to the removal of value-based promotion from NumPy. Changed from int64 to uint8.
  /tmp/wfdb-python-build.t56g2c/wfdb-python/wfdb/io/annotation.py:2240: UserWarning: result dtype changed due to the removal of value-based promotion from NumPy. Changed from int64 to uint8.
  /tmp/wfdb-python-build.t56g2c/wfdb-python/wfdb/io/_signal.py:2374: UserWarning: result dtype changed due to the removal of value-based promotion from NumPy. Changed from int32 to int16.
  /tmp/wfdb-python-build.t56g2c/wfdb-python/wfdb/io/convert/edf.py:420: UserWarning: result dtype changed due to the removal of value-based promotion from NumPy. Changed from int16 to int64.
  /tmp/wfdb-python-build.t56g2c/wfdb-python/wfdb/io/convert/edf.py:414: UserWarning: result dtype changed due to the removal of value-based promotion from NumPy. Changed from int16 to int64.
  /tmp/wfdb-python-build.t56g2c/wfdb-python/wfdb/io/convert/edf.py:409: UserWarning: result dtype changed due to the removal of value-based promotion from NumPy. Changed from int16 to int64.

(there are some DeprecationWarnings too; that's another story...)

cve2022 commented 4 months ago

@bemoody and @tompollard , thanks for the heads up on numpy version, I have "downgraded" numpy and it works.

BR.

Ivorforce commented 4 months ago

I had the same issue and can confirm downgrading to numpy<2 avoids it.

cbrnr commented 2 months ago

May I ask if a more permanent fix (i.e. upgrading to NumPy >= 2) is planned in the near future? One of my packages depends on wfdb, and I cannot upgrade to NumPy >= 2 because of this pin.

bemoody commented 2 months ago

This is definitely an important issue that needs fixing. Help is always appreciated! But I'll try to devote some time to it this week, and at least get a better idea of how much work it'll be to fix.

cbrnr commented 2 months ago

Sure, I'm happy to help, but given you are already familiar with the code, and you already listed the warnings, this might not be too difficult. You'd have to specify which dtype you want in these cases, which is something I cannot help with.

cbrnr commented 2 months ago

I'm a little worried that the test suite doesn't contain the OverflowError from the demo, so that might mean it's probably not that straightforward. Or will running all tests and all demos be sufficient?

bemoody commented 2 months ago

You'd have to specify which dtype you want in these cases, which is something I cannot help with.

Yes, figuring that out could be tricky. I'm assuming that most of the code has (at some point) been tested using numpy 1.x, with its C-like promotion semantics, and therefore we should assume those semantics are what we want to preserve.

But, although I have a high-level understanding of the code, I didn't write most of it. :)

I'm a little worried that the test suite doesn't contain the OverflowError from the demo, so that might mean it's probably not that straightforward.

You're right, but I think using "weak_and_warn" mode should help. In my comment above, I saw that it reported a UserWarning on annotation.py lines 2239 and 2240, so it does tell us there's a potential problem there.

On the other hand, I'm not sure whether "weak_and_warn" mode is something that the numpy package intends to support in the long term.

GGChe commented 2 months ago

Hi there. My team is also using wfdb extensively and we are planing to bump to numpy 2.1.1 and we just encountered this issue. Is there any plan for a wfdb version package release with numpy upper bound so that we can manage in other tomls whether we use one or another numpy version?

Thanks for the work

cbrnr commented 1 month ago

I wanted to mention another point: with a NumPy < 2 pin, it is not (easily) possible to support Python 3.13 (which was released last week), since there are no 3.13 wheels for NumPy 1.x.

cbrnr commented 1 month ago

This should now be fixed by #511 (and of course all previous related PRs that fixed issues related to NumPy 2).