Vectorized / Static-Sort

A simple C++ header-only library for fastest sorting of small arrays. Generates sorting networks on compile time via templates.
52 stars 6 forks source link

[suggestion] Simplify use in Plaformio/Arduino #3

Closed puzrin closed 4 years ago

puzrin commented 4 years ago

Current usage expects manual sources copy. That's works, but not enough "simple". I'd suggest rearrange files to allow lib use by URL and so on.

What's needed:

https://github.com/ETLCPP/etl - example of project.

Vectorized commented 4 years ago

Updated the directory structure.

puzrin commented 4 years ago

Great!

I already had time to try this awesome lib and decline it back :). Don' know why, but median filter with length = 6 & your sorter works slower than truncated mean on sm32f072. Probably, i did something wrong, because expected the opposite result.

puzrin commented 4 years ago

Just a mad idea... CLI tool to generate sorter function of desired length + pre-generated ones for 2...32.

Why? Because this must work without -O2. Usefull for debug & small embeddeds (because -O2/O3 may increase size significantly, and this may be not acceptable)

Vectorized commented 4 years ago

For your particular application, the truncated mean makes more sense. It is O(n) in time complexity, while sorting is O(n log n).

You may want to make count a template variable if possible. Then you can do away with the table, and the compiler may do more optimizations.

template <int Count>
uint32_t truncated_mean(uint16_t *src, fix16_t window)
{
    const fix16_t inv_div = F16(1.0 / Count);
    // other code here.
}
puzrin commented 4 years ago

Thank you for explanation & advice! I will update my code.

BTW, there is https://github.com/Morwenn/cpp-sort. It uses more optimal [well-known] pre-generated networks for small numbers. But more difficult to understand how to use than your one.

I did not created new issue about pre-generaed newtworks, because don't know if you are inerested or not.

Vectorized commented 4 years ago

Looks like lots more code and work needed.
From benchmarks, seems like the improvement in speed, if any, is very tiny. :)

puzrin commented 4 years ago

I agree, in context of speed, added value is about zero and not worth efforts. But there may be another advantage - if users need <=32 filter length, result will not depends on compiler optimization settings any more.

Morwenn commented 4 years ago

Chiming in: if you want raw speed for sorting networks your best bet is SIMD networks optimized by hand: if I'm not mistaken bitonic sorting networks have a few interesting implementations using SIMD intrinsics that work quite fast. I think that floki does just that but the link to the paper which describes aligned-accessed sort doesn't work for me.

If you don't want to use SIMD, there is this project that tries to optimize the kind of simple swap that is omnipresent in sorting networks. You can either embed the best kind of comparisons in a library or let users provide their own swapping functions.

I'm still in the middle of regenerating benchmarks for cpp-sort, but if I recall correctly, the sorting networks I have are good with int, but are otherwise a pessimization compared to many simpler solutions. If you need a single fixed-size network then you might have some speed gains (WikiSort does just that, and replacing the sorting networks with an insertion sort made the algorithm slower), but when you need several such networks, they tend to actually represent a lot of machine code, which has a cost.