IntelPython / sdc

Numba extension for compiling Pandas data frames, Intel® Scalable Dataframe Compiler
https://intelpython.github.io/sdc-doc/
BSD 2-Clause "Simplified" License
645 stars 61 forks source link

Moving to numba=0.53.1 #971

Closed kozlov-alexey closed 3 years ago

kozlov-alexey commented 3 years ago

Motivation: keep up with latest Numba release.

Note: commit 2017e6c28d is actually just a workaround for numba=0.53.1 regressions and probably should be reverted once referred issues are fixed.

kozlov-alexey commented 3 years ago

Regarding possible performance impact for numpy_like.astype() from numeric arrays to StringArrayType. There's actually no impact at all, since for some reason iterating over StringArrayType with prange doesn't scale (should be investigated), so testing conversion from 5 * 10 ** 6 of int64 to string, SDC code is 1.5x times faster than pandas, but doesn't scale (i.e. with and without this change):

n_threads 1 2 4 8 16
tested 1.769644 1.77148 1.768676 1.768381 1.769647
reference 2.677486 2.624062 2.624959 2.622924 2.625067
ratio 1.513008 1.481282 1.484138 1.483235 1.483385
kozlov-alexey commented 3 years ago

Numba=0.53.1 regressions mentioned above are: https://github.com/numba/numba/issues/6969 and https://github.com/numba/numba/issues/6960