Open andreasvc opened 7 years ago
It is actually not really surprising I would say, because the C and Cython implementation are both equally low level. The internals of the Cython implementation avoid using Python data structures (except for the Python interface of course), so the Cython code is mostly compiled to pure C code with manual memory allocation and contiguous arrays etc.
I am not familiar with Cython, I did not think it was low level like this. Is it possible to have vectorized instructions, like AVX ?
Given the results in issue #2 the first explanation actually seems most likely, because the differences disappeared when AVX was disabled for the C implementation.
This issue was about the better performance of pyroaring with low densities. Here, I agree that this looks to come from the SSE and AVX code. For the better performance of cyroaring with high densities, this is not clear. My guess would be the container types.
Cython can be this low level, it depends on the approach. This blog post gives some background on the "writing C in Cython" style: https://explosion.ai/blog/writing-c-in-cython
Yes, it's possible to use vectorized instructions, using intrinsics. However if you need inline assembly or compile-time conditions to support different compilers, you may need to write some bits of C code, but you can call the resulting functions directly from Cython (this is what I did in cyroaring).
You are right that the container type is probably the reason for the higher performance of cyroaring at higher densities. This type of benchmark highlights precisely what the inverted containers would be expected to be good at. A different benchmark (medium density but long runs) could probably uncover an advantage of run-length containers.
Thanks for the link, I should definitely try Cython!
The python_analysis.ipynb notebook says:
It is actually not really surprising I would say, because the C and Cython implementation are both equally low level. The internals of the Cython implementation avoid using Python data structures (except for the Python interface of course), so the Cython code is mostly compiled to pure C code with manual memory allocation and contiguous arrays etc.
The difference seems to be a constant factor both when cyroaring is faster and when it is slower, so that suggests an implementation detail rather than a strictly better algorithm. The differences could be due to:
Given the results in issue #2 the first explanation actually seems most likely, because the differences disappeared when AVX was disabled for the C implementation.