alugowski / fast_matrix_market

Fast and full-featured Matrix Market I/O library for C++, Python, and R
BSD 2-Clause "Simplified" License
75 stars 7 forks source link

`fmm.mmread` raises `ValueError` instead of `OverflowError` when reading integers that are too large #11

Closed eriknw closed 1 year ago

eriknw commented 1 year ago

This is yet another (small) difference between fmm.mmread and scipy.io.mmread, which I think you may like to know if you want fmm.mmread to be more-or-less be a drop-in replacement for scipy.io.mmread:

import fast_matrix_market as fmm
import scipy
from io import StringIO
from scipy.io.tests import test_mmio

try:
    scipy.io.mmread(StringIO(test_mmio._over64bit_integer_dense_example))
except OverflowError:
    pass
try:
    scipy.io.mmread(StringIO(test_mmio._over64bit_integer_sparse_example))
except OverflowError:
    pass

try:
    fmm.mmread(StringIO(test_mmio._over64bit_integer_dense_example))
except ValueError:  # <-- This isn't OverflowError
    pass
try:
    fmm.mmread(StringIO(test_mmio._over64bit_integer_sparse_example))
except ValueError:  # <-- This isn't OverflowError
    pass

See: https://github.com/scipy/scipy/blob/69e0c474f886990a85a88521cffb8b3978bdfd66/scipy/io/tests/test_mmio.py#L321-L335

alugowski commented 1 year ago

Good find.

Fixing this brought to light overflow handling in general. Integer overflow is now handled more explicitly.

Floating point under/overflow is a trickier issue as there are two valid interpretations. Consider 1e9999 or 1e-9999 being read into a double or Python float. Some would say those should throw an error, others insist that those should be inf or 0, respectively. Python (and SciPy) take the latter approach, as do JavaScript, Swift, and any code that ignores the ERANGE error from strtod() (which is how Python gets its behavior).

So I added a switch to choose best match or throw behavior for floating point overflow. That required contributing to some floating-point library dependencies :)