Ezibenroc / roaring_analysis

MIT License
5 stars 1 forks source link

Comments #3

Open andreasvc opened 7 years ago

andreasvc commented 7 years ago

The python_analysis.ipynb notebook says:

For sparse data, pyroaring is significantly faster than cyroaring. For densities of 0.1 and 0.5, the two implementations seem to have similar performances. It is rather surprising, we could expect the C implementation to always be faster.

It is actually not really surprising I would say, because the C and Cython implementation are both equally low level. The internals of the Cython implementation avoid using Python data structures (except for the Python interface of course), so the Cython code is mostly compiled to pure C code with manual memory allocation and contiguous arrays etc.

For the density of 0.999, cyroaring is clearly faster. An hypothesis for this is that cyroaring uses different data structures, as mentionned in comment.

The difference seems to be a constant factor both when cyroaring is faster and when it is slower, so that suggests an implementation detail rather than a strictly better algorithm. The differences could be due to:

Given the results in issue #2 the first explanation actually seems most likely, because the differences disappeared when AVX was disabled for the C implementation.

Ezibenroc commented 7 years ago

It is actually not really surprising I would say, because the C and Cython implementation are both equally low level. The internals of the Cython implementation avoid using Python data structures (except for the Python interface of course), so the Cython code is mostly compiled to pure C code with manual memory allocation and contiguous arrays etc.

I am not familiar with Cython, I did not think it was low level like this. Is it possible to have vectorized instructions, like AVX ?

Given the results in issue #2 the first explanation actually seems most likely, because the differences disappeared when AVX was disabled for the C implementation.

This issue was about the better performance of pyroaring with low densities. Here, I agree that this looks to come from the SSE and AVX code. For the better performance of cyroaring with high densities, this is not clear. My guess would be the container types.

andreasvc commented 7 years ago

Cython can be this low level, it depends on the approach. This blog post gives some background on the "writing C in Cython" style: https://explosion.ai/blog/writing-c-in-cython

Yes, it's possible to use vectorized instructions, using intrinsics. However if you need inline assembly or compile-time conditions to support different compilers, you may need to write some bits of C code, but you can call the resulting functions directly from Cython (this is what I did in cyroaring).

You are right that the container type is probably the reason for the higher performance of cyroaring at higher densities. This type of benchmark highlights precisely what the inverted containers would be expected to be good at. A different benchmark (medium density but long runs) could probably uncover an advantage of run-length containers.

Ezibenroc commented 7 years ago

Thanks for the link, I should definitely try Cython!