Closed CodingMadness closed 3 years ago
In short: various types have various intrinsic costs, and the only thing anoprsst can try and do is avoid unnecessary overheads; fundamentally the sorting algorithm isn't much different (well, disregarding parallelism). Reference types have more overheads, so sorting algorithm costs get a little more swamped by plain memory-access and indirection costs. Additionally, .net has specialized implementations for some basic types; and string may well be one of those. If you're sorting small arrays of basic types (so e.g. sorting int[]
as opposed to Wrapper[]
with struct Wrapper { int val }
- then that specialization means you're never going to be winning a lot of perf until parallelization kicks in somewhere between 10 000 and 100 000 elements.
In more detail:
When sorting data, there are various costs involved:
The relationship between these costs varies depending on what you're sorting. For small things like structs (e.g. int
or (int, double)
), quite of bit of the cost (for Array.Sort) can lie simply in the interface-dispatch of IComparer
. There's a big win in specializing (at runtime) the sorting algorithm for such types, and that's what the JIT would do if array.Sort used generically constrained type parameters with value-type argument values - but Array.Sort isn't generic like that for custom comparers; and so just doing that a big win. Notably, .net core (and framework) have specialized implementations for basic types like int or double, so specifically those exact types are sorted quickly by Array.Sort, and IIRC .net 5's implementations have improved. But if you use even trivial wrappers around an int
or something like a tuple - and performance falls off a cliff; anoprsst avoids that by specializing for all types. Notably, that's not entirely free, right? That means more native code generated by the JIT, and a slight startup cost (because the JIT needs to generate that).
When you move along to other configurations, like say a custom class with custom comparer - then simply the indirection of looking up data via a reference starts adding up. Sorting small structs is just a lot more cache friendly than sorting small classes - each object adds 24 bytes of overhead, and you need to store the reference to the object too, so that's a total of 32 bytes of overhead in addition to the actual payload. The point being: your cache will do less well here. And it's not just the sheer size of objects that's expensive, objects are also allocated and laid out in memory by the GC depending on various factors, not least of which is simply the order in which they were allocated. In many situations, objects that are adjacent in the array may not be adjacent on the heap, and that reduces cache hit rates further.
Strings are thus a kind of best-case for .net sorting: they're built-in unwrapped types (hence possibly specialized), and they're reference types, so whatever you do matters little anyhow. I could try tuning it, sure, but you shouldn't expect great gains no matter what you do simply because of the memory indirection costs.
Does that answer your question?
Ok thx alot for this deatiled explanation so far, makes sense to me! But you would say it is atleast not slower than the usual .NET built in Sorts(..) functions, isnt it?
Back when I benchmarked that (https://github.com/EamonNerbonne/anoprsst/blob/master/results-1.1-netcore2.2.txt) there was no relevant difference in performance between the default (quicksort) anoprsst uses. Merge sort was slightly faster. Results didn't meaningfully budge with .net core 3.0 + 3.1; I'll rerun the .net 5 prerelease and see what that results in.
However, if sorting plain strings is actually important to you, there exist specialized string sorting algorithms that are much faster.
The thing is, I need your Sort for a generic function where my only constraint MUST BE that it is only from IComparable
here my current method:
` public static Span<T> Where<T>(this Span<T> span) where T : IEquatable<T>, IComparable<T>
{
span.WithOrder(new ComparableOrdering<T>()).QuickSort();
return null;
}
So:
Span<T>
- a caller might confusingly think the input is left unchanged. Instead, either return void
, or make a copy before sorting.IComparable
s, you can use .WithIComparableOrder().Sort()
. I wouldn't bother wrapping that in a method, and if you don't, then if your IComparable is compile-time known to be one of the special cases (like float
) you'll get a slightly more optimal parameterization.public static void MySortWrapper<T>(this Span<T> span)
where T : IComparable<T>
=> span.WithIComparableOrder().Sort();
The thing is, I am trying to build up some functions to simulate IEnumerable
["a", "b", "c", NULL, "d", NULL, "e"] ====> QUICKSORT =====>[NULL, NULL, "a", "b", "c", "d", "e"] and now i would do:
return span.slice(0, _lastSortedIndexReturnedMaybeByQuickSortAlgorithm(....)_)
and thats it.
So i want in this array case, to get the index of 1, since the sorted array ends defacto at position 1 in the arraay, at 2 the other valid values begin,
So would be super nice if you could return always the last index from ur algorithm, so i can use that index for this.
How would you define the "last sorted index" for an arbitrary T where T:IComparable<T>
?
I mean, within your algorithms can u just not give me back the last Item which was sorted successfully? like in my example?
like in thisarray, the [2] element is the last NULL valuer which was sorted so 2 is the index ur function could return, right?
like as i saw in ur code u play alot of with Usnafe.add and subtract readPtr and writePtr, and i think u can do everytime u would successfully leave the function, you could do: "Unsafe.byteOffset(ref writePtr, ref readPtr) / Unsafe.SizeOf; IMHO
All indexes are always sorted successfully. There's nothing special about null
, and in any case not all T
are nullable.
yea true but i need to filter NULLS out of my array/span so this would begood.
Ok other idea actually, can u make predicate based sort (not only for this nULL stuff now, generall idea) so you can sort based on a precicate and then u would have a span where predicate/Condition is met and the other span doesnt met condistion so you have 2 spans within 1 span
If you want to just filter a span, I'd do that, instead of sorting. Sorting is just more expensive. But sure, you could define an ordering that implements LessThan
based on the outcome of some predicate, e.g. LessThan(T a, T b) => !predicate(a) && predicate(b);
- but that's almost certainly being a little too complicated and clever; you're better off just writing a filter function.
ok yea, so would u say i cant use ur sorting algos for my filter function or shall i write it from scratch?
the idea with the predicate would be to make it generic like the: ENumerable.Where(predicate) only this time its for span and hence needs this span in span filtering
The LessThan
implementation given above would in theory work; you can use any sorting algorithm like that (followed by a binary search or scan). But it would be slower than a (reasonable) direct implementation of filter, and quite complicated in comparison.
yea i kinda wanna write this with predicate, because u have then this IEnumerable.Where simulation but much faster for spans ofc and yea u cant yield but this is not my goal. you can easily slice then the items withc a certgain condition
direct implemntation would not be generic, but hence im not wrrint a public API yet for this, i could do the direct one for faster filtering
Good luck with your experiments; I'm closing this because the question is answered as best as I can.
Title.