cchdo / hydro

The big ol CCHDO netCDF-CF project
https://hydro.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
1 stars 2 forks source link

Test against numpy 2 #37

Closed DocOtak closed 1 month ago

DocOtak commented 2 months ago

Looks like the numpy 2.0 release has occurred (or is in progress). This library does a bunch of string manipulation that has been moved to https://numpy.org/devdocs/reference/routines.strings.html#module-numpy.strings, while not explicitly deprecated yet, the existing numpy.char module will not be getting updates.

Bonus: performance testing of both methods to see if there are any changes there.

DocOtak commented 1 month ago

Initial testing shows a 20 to 40x speed up in the extract_numeric_precisions function when converted to use the new numpy.strings module (80ms -> 3ms, or 1.2s -> 30ms for an even larger input of 1.5m strings)

Overall parsing a large (s04p) ctd dataset shoed a 3x speedup (15s -> 5s).

This was on an M1 machine, not sure how well x86 would do, but I suspect there might be similar speedups.

DocOtak commented 1 month ago

The string processing speedups have been implemented, numpy >=2 is now required