Closed densmirn closed 4 years ago
In above example _accums_0
is array of float accumulators, _accums_2
is array of integer accumulators.
Laptop numbers: Old implementation:
name | nthreads | type | size | median |
---|---|---|---|---|
DataFrame.groupby.sum | 1 | Python | 2000000 | 0.216 |
DataFrame.groupby.sum | 1 | SDC | 2000000 | 0.678 |
DataFrame.groupby.sum | 4 | SDC | 2000000 | 0.606 |
New implementation:
name | nthreads | type | size | median |
---|---|---|---|---|
DataFrame.groupby.sum | 1 | Python | 2000000 | 0.2 |
DataFrame.groupby.sum | 1 | SDC | 2000000 | 0.285 |
DataFrame.groupby.sum | 4 | SDC | 2000000 | 0.139 |
Python / NewSDC4 = 1.439
OldSDC4 / NewSDC4 = 4.36
There are 100 unique labels, only float data.
Example of generated code with such data as
{'A': int, 'B': float, 'C': float, 'D': int, 'E': float, 'F': int, 'G': int, 'H': float, 'I': float, 'G': float, 'K': int}
where group by'A'
: