Closed rickpgood closed 4 years ago
Hi Rick, It may be possible to do what you're wanting. Could you provide more details about the needs?
HPCsharp has a function SortRadix with the following interface:
///
It's a little bit of an awkward interface, which I've been meaning to improve, but essentially the user provides two arrays: an input array and an array of keys. The sorting algorithm then sorts the array of keys, and also the input array along with it.
Is this what you're wanting, but for Parallel Merge Sort algorithm? Are you sorting based on the double[] array, as this double[] array is the array of keys? HPCsharp also has the ability to sort a double[] using Radix Sort (SortRadixMsd) in linear time. -Victor
Hi Victor,
The current SortRadix function is actually backwards of what I need.
Consider this example:
double[] val = { 5.1, 1.9, 4.2, 0.3, 3.1, 3.1, 2.5 };
uint[] idx = { 0, 1, 2, 3, 4, 5, 6 };
Array.Sort(val, idx);
Does an in-place sort and returns val = {0.3, 1.9, 2.5, 3.1, 3.1, 4.2, 5.1} idx = {3, 1, 6, 4, 5, 2, 0}
SortRadix will sort the [double] val based on the values of [uint] idx.
uint[] idx_out = new uint[7];
Algorithm.SortRadix(val, idx, ref idx_out);
Will then return the original values of val and idx. So yes, I'm looking to sort based on the double[] array keys.
Per your example, I created a user-defined class containing (val, idx) and a comparer:
public class UserDefinedClassComparer : IComparer<UserDefinedClass>
{
public int Compare(UserDefinedClass first, UserDefinedClass second)
{
if (first.Val > second.Val)
return 1;
if (first.Val < second.Val)
return -1;
return 0;
}
}
Unfortunately, that approach was about 4x slower than Sort.Array() on my 500k row case.
Thanks again for your help.
Rick
Hi Rick, Great experiments and creative ideas! Yeah, adding a Compare function call really slows things down. You could change the Compare() implementation to do: return (int)(first.Val - second.Val); which should work for double, but I bet it will still be too slow.
Sadly, the interfaces are not quite right for what you need. And, it's a little bit harder than I thought to add new interfaces to support it, but not terrible. I'll add it to the top of my list of ones to implement for the Parallel Merge Sort, to support the same dual-array interfaces as Array.Sort(). The result will be a bit slower since it has to manipulate two arrays, but hopefully faster than Array.Sort(). And, to add support for SortRadix() to sort based on double[] keys, and not just integer keys. I already implement this in SortRadixMsd(), where it can sort double[], but it is a bit slower than SortRadix() which uses an LSD Radix Sort algorithm.
Can't make any time-frame promises though.
Could you try
public static void SortRadixMsd(this double[] arrayToBeSorted)
to see how fast it sorts double[] on your data set to see if there is hope for it being fast enough?
It's also would have been cool if this was possible Array.AsParallel().Sort(), like it is for Linq. -Victor
Thanks Victor!
This is what I have so far for timing (mean of 50x).
val.SortMergeInPlacePar() 15ms val.SortMergeInPlaceStablePar() 30ms Array.Sort(val) 31ms Algorithm.SortRadixMsd(val) 59ms Array.Sort(val, idx) 67ms
for the next two I smoosh'd the var and idx arrays together to make a 2-column jagged array S = [var, idx] and used your comparer advice from above.
public class UserDefinedClassComparer : IComparer<double[]>
{
public int Compare(double[] first, double[] second)
{
return (int)(first[0] - second[0]);
}
}
S.SortMergeInPlacePar(comparer) 150ms (plus ~30ms for array smoosh'n) Array.Sort(XS, comparer) 490ms (plus ~30ms for array smoosh'n)
So you can see SortMergeInPlacePar is 2-3x as faster than Array.Sort with a (more or less) apples-apples comparison. Cool algo!
Rick
Take a look at the new 3.8.0 release, which now has
public static void SortMergeInPlacePar<T1, T2>(this T1[] keys, T2[] items, IComparer
Oops, further testing revealed that keys are sorted, but items are not - working on tracking the issue down.
I also improved interfaces on SortRadix with keys and items, which now returns a tuple of sorted arrays of keys and items, making it more functional-style
Try the 3.8.1 version I just pushed, as it fixes the problem. However, there are a few miscomparisons between Array.Sort(double[], int[]) and the new SortMergeInPlacePar(double[], int[]). I'm working on tracking these down. My theory is that these are several keys which are equal to each other, showing up as "stability differences", as some sorting algorithms are not stable, where equal keys can get swapped in position from input to output. I'll let you know what I find.
I found that to be the exact issue when comparing the results of the two algorithms: Diff at index 1349 (-1.79697576570928E+308:1479440) (-1.79697576570928E+308:653777) Diff at index 1350 (-1.79697576570928E+308:653777) (-1.79697576570928E+308:1479440)
The keys are of the same value, but items are swapped. This is exactly what sorting stability is all about, where sometimes we care about it and other times it's ok if the order gets swapped of items associated with equal keys.
Hi Victor, it works like a charm. For my case I'm seeing a 1.8x ratio. Very nice work.
Array.Sort(val,idx) --> 67ms val.SortMergeInPlacePar(idx) --> 37ms
Thanks for adding this feature. I think others will find it useful too.
Rick
You're welcome! Glad it works for you. I also think that fixing LSD Radix Sort to provide support for double[] keys would run even faster, while using only a single core. I added it to the list of items to work on.
Hi, is it possible to use parallel merge sort to return the sorted indexes / keys? Or to sort one int[] array according to the order of another double[] array?
For example, Array.Sort(x1,x2) will sort x2 in the order of x1.
SortMergeInPlacePar is 2x faster than Array.Sort for my n=500k double array so I would really like to use it.