LilithHafner / InterLanguageSortingComparisons

6 stars 2 forks source link

Stop favouring numeric sort specialized algorithms #11

Open dumblob opened 1 year ago

dumblob commented 1 year ago

It seems the current benchmark sorts numbers (disregarding whether floats or integers) in an array/vector/list/...

Yet, this is fundamentally flawed in my opinion as many (most?) languages and compilers and libraries do specialization (in compile time or in run time or both) and switch to magnitudes faster numeric sorting algorithm ("non-comparison sort") which has totally different mathematical boundaries (almost O(n)) than what this benchmark wants to measure - i.e. the pair-wise comparison sorting algorithms ("comparison sort") with provably best possible O(n log n).

https://en.wikipedia.org/wiki/Sorting_algorithm#Non-comparison_sorts

Should we rather compare some complex structs of e.g. 19 bytes in size?

I would prefer no power of 2, not bigger than half of a typical cache line (i.e. max 32 bytes), no even number, and not "power of two plus 1" (to make struct compression difficult to avoid matching power of 2). Any implicit padding under the hood is fair IMHO.

LilithHafner commented 1 year ago

This repo is for evaluating different language's approaches to performing sorting dominated workflows. There are no requirements about the internals of the sorting algorithms, including whether they are comparison-based or non-comparison-based.

That said, there are lots of sorting tasks and ways to evaluate sorting algorithms and I agree that it would be nice to cover cases other than IEEE floating point. If I get the time and energy to do it, I might extend this repo to a wider scope. However, having to reproduce all the benchmarking code in 8 languages makes this somewhat time-consuming. which is why, as of now, according to the readme, "the benchmark covers only sorting of random floating point numbers into increasing order."

dumblob commented 1 year ago

the benchmark covers only sorting of random floating point numbers into increasing order

Yep. IMHO this necessitates a big disclaimer in the readme about the comparison (O(n log n)) and non-comparison (O(n)) differences as this benchmark seems to be linked from different places AFAIK but all those places deal with comparison sorts and this benchmark is being understood as a comparison benchmark which is 100% false leading to flawed and very misleading assertions like - and I quote - "so the primary comparison of note there is that Julia's default sorting algorithm (which happens to be stable) is about twice as fast as glidesort".

LilithHafner commented 1 year ago

Please quote complete sentences when a partial sentence excerpt omits important limitations.

The purpose of InterLanguageSortingBenchmarks is to give a rough idea of performance across languages, not within them, so the primary comparison of note there is that Julia's default sorting algorithm (which happens to be stable) is about twice as fast as glidesort in that particular benchmark on that particular system.

https://github.com/orlp/glidesort/issues/2#issuecomment-1419557339

dumblob commented 1 year ago

Yes, that is exactly what I meant - "twice as fast" is by far not - and I quote - "a rough idea of performance" (in the context of sorting algorithms where the champions compete with rather small differences in performance).

That is why I find it very misleading and actually rather false.