IntelPython / sdc

Numba extension for compiling Pandas data frames, Intel® Scalable Dataframe Compiler
https://intelpython.github.io/sdc-doc/
BSD 2-Clause "Simplified" License
645 stars 61 forks source link

Adding support of pd.RangeIndex for Series and DFs #862

Closed kozlov-alexey closed 4 years ago

kozlov-alexey commented 4 years ago

This PR:

  1. Modifies boxing/unboxing of Series and DFs to handle pd.RangeIndex,
  2. Adds fix_df_index to transform values of index argumenent of Series and DF ctor calls, which fixes RewriteDataFrame ctor now handling index=None as argument,
  3. Adds iteration, operators (is, eq, ne) support for RangeIndexType,
  4. Renames and refactors sdc_check_indexes_equal to numpy_like.array_equal,
  5. Adds specializations for RangeIndexType in all Series/DF methods, such as operators, getitem, setitem and indexing related functions (sdc_join_series_indexes, sdc_reindex_series, etc).
pep8speaks commented 4 years ago

Hello @kozlov-alexey! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:

Comment last updated at 2020-06-24 12:21:45 UTC
kozlov-alexey commented 4 years ago

It seems performance degraded with these changes, need to fix that before merge, e.g. for operator.add:

with #862: 1 Python 10000000 0.031289 0.03108 0.033158    
1 SDC 10000000 0.188526 0.188433 0.188893 2.371142 0.000554
2 SDC 10000000 0.152458 0.152053 0.153302 2.43293 0.000682
4 SDC 10000000 0.086953 0.086793 0.087348 2.425093 0.000674
8 SDC 10000000 0.053097 0.053085 0.053173 2.430201 0.000825
16 SDC 10000000 0.036783 0.036191 0.037241 2.418686 0.000837
28 SDC 10000000 0.03135 0.031309 0.031361 2.425914 0.000833
56 SDC 10000000 0.029505 0.02948 0.030095 2.413604 0.000696
on master: 1 Python 10000000 0.03126 0.031004 0.031958    
1 SDC 10000000 0.047348 0.047187 0.047402 0.393299 0.00041
2 SDC 10000000 0.033103 0.033065 0.033186 0.446369 0.000514
4 SDC 10000000 0.017455 0.017369 0.017512 0.44836 0.000512
8 SDC 10000000 0.008907 0.008868 0.010109 0.444994 0.000514
16 SDC 10000000 0.005434 0.005059 0.007267 0.443173 0.000626
28 SDC 10000000 0.004042 0.003909 0.005736 0.444903 0.000519
56 SDC 10000000 0.004228 0.004043 0.006067 0.445996 0.000637
kozlov-alexey commented 4 years ago
With reverting back to a separate impl for none (positional) indexes perf stays the same with this PR, see below for Series.operator.add: with this PR: nthreads type size median min max compile boxing
1 Python 10000000 0.031252 0.031215 0.031294    
1 SDC 10000000 0.049261 0.049189 0.049536 0.4135 0.000481
2 SDC 10000000 0.033909 0.033767 0.033971 0.448139 0.000529
4 SDC 10000000 0.017572 0.017527 0.017576 0.453382 0.000608
8 SDC 10000000 0.00911 0.009097 0.009185 0.446261 0.000554
16 SDC 10000000 0.005 0.00499 0.005736 0.465148 0.000619
28 SDC 10000000 0.003969 0.003954 0.005549 0.454575 0.000649
56 SDC 10000000 0.004146 0.00411 0.004553 0.451319 0.000679
on master (e0619659131a86): nthreads type size median min max compile boxing
1 Python 10000000 0.031025 0.030967 0.031119    
1 SDC 10000000 0.049254 0.049175 0.049279 0.392598 0.000421
2 SDC 10000000 0.034642 0.03444 0.034732 0.450111 0.000578
4 SDC 10000000 0.017848 0.017798 0.018241 0.45714 0.000616
8 SDC 10000000 0.009329 0.009308 0.010389 0.452543 0.000596
16 SDC 10000000 0.005169 0.005132 0.007395 0.448971 0.000621
28 SDC 10000000 0.004017 0.003881 0.005692 0.440998 0.000525
56 SDC 10000000 0.004092 0.004059 0.006165 0.462535 0.000664