Closed certik closed 3 years ago
Is it also an option to link to the algorithms provided by BLAS/LAPACK?
@jvdp1 I think so. We'll need to depend on Lapack for #10 anyway.
Here are inefficient implementations from fortran-utils: https://github.com/certik/fortran-utils/blob/b43bd24cd421509a5bc6d3b9c3eeae8ce856ed88/src/sorting.f90, the API however might be useful to get inspiration from. Here is more efficient implementation of quicksort: https://github.com/certik/hfsolver/blob/b4c50c1979fb7e468b1852b144ba756f5a51788d/src/sorting.f90#L165. I think there will be much better implementations out there, including in Lapack.
Here is a mutli-threaded one: https://balsoftware.net/index.php/open-source/multi-thread-sort/ Not sure what is the license.
The following routines implement the version of quicksort outlined in Hanson's and Hopkins book and on the associated web site (http://www.siam.org/books/ot134). The publishers license is very permissive (at least as I read it). This version supports REAL32, REAL64, INT8, INT16, INT32, INT64, Character strings and arrays, and a user defined type. Unfortunately, the file tool for this page does not know anything about .f90 or .F90 file extensions. Here is the code:
Main quicksort module: type specific routines are overloaded with generic qsort interface. Sorting can be in place or return a integer permutation array leaving original array untouched. Sorting can be in either ascending or descending order.
A base user class that can be extended to define user specific classes. Specifies dummy methods for relational operators needed for sorting etc, a print method and an assignment method.
A test program that tests sorting integers, reals, characters, and a user point class that is sorted on distance (Euclidean Norm)
The point class used in test program. It is extended from the base User class
Here are some of mine:
A few more:
ipsort
, isort
, spsort
, ssort
, dpsort
, dsort
, and hpsort
)qsort
FWIW I needed a sorting procedure to sort the results of of hashing functions to estimate their tendency to produce conflicting hashes. I have implemented four sorting algorithms for sorting rank one arrays of kinds INT8, INT16, INT32, INT64, SP, DP, and QP.
I have implemented eight sets of arrays for testing the codes, all of length 2**20.
The quicksort array sometimes exhibits quadratic behavior on the Identical, Randon-3, and Random-10 arrays so I turned off testing for this sort on these three arrays. The results of the testing are summarized by the following table. The times are averages of eight consecutive runs.
Type | Elements | Order | Method | Time (s) |
---|---|---|---|---|
Integer | 1048576 | Random order | QuickSort | 0.14268 |
Integer | 1048576 | Increasing | QuickSort | 0.04419 |
Integer | 1048576 | Decreasing | QuickSort | 0.13219 |
Integer | 1048576 | Random sparse | QuickSort | 0.14551 |
Integer | 1048576 | Random dense | QuickSort | 0.14235 |
Integer | 1048576 | Random order | Intro_Sort | 0.14291 |
Integer | 1048576 | Identical | Intro_Sort | 0.14412 |
Integer | 1048576 | Increasing | Intro_Sort | 0.04812 |
Integer | 1048576 | Decreasing | Intro_Sort | 0.12334 |
Integer | 1048576 | Random sparse | Intro_Sort | 0.14504 |
Integer | 1048576 | Random dense | Intro_Sort | 0.14413 |
Integer | 1048576 | Random 3 | Intro_Sort | 0.10890 |
Integer | 1048576 | Random 10 | Intro_Sort | 0.34179 |
Integer | 1048576 | Random order | Sim_Tim_Sort | 0.16737 |
Integer | 1048576 | Identical | Sim_Tim_Sort | 0.06190 |
Integer | 1048576 | Increasing | Sim_Tim_Sort | 0.06276 |
Integer | 1048576 | Decreasing | Sim_Tim_Sort | 0.09906 |
Integer | 1048576 | Random sparse | Sim_Tim_Sort | 0.16657 |
Integer | 1048576 | Random dense | Sim_Tim_Sort | 0.16756 |
Integer | 1048576 | Random 3 | Sim_Tim_Sort | 0.06303 |
Integer | 1048576 | Random 10 | Sim_Tim_Sort | 0.06194 |
Integer | 1048576 | Random order | Tim_Sort | 0.22234 |
Integer | 1048576 | Identical | Tim_Sort | 0.00234 |
Integer | 1048576 | Increasing | Tim_Sort | 0.00232 |
Integer | 1048576 | Decreasing | Tim_Sort | 0.00378 |
Integer | 1048576 | Random sparse | Tim_Sort | 0.20528 |
Integer | 1048576 | Random dense | Tim_Sort | 0.20005 |
Integer | 1048576 | Random 3 | Tim_Sort | 0.00353 |
Integer | 1048576 | Random 10 | Tim_Sort | 0.00507 |
In summary,
FWIW timsort puts a lot of effort into reducing the number of comparisons of elements of the array/list at the expense of some convoluted logic. I suspect this effort is wasted on intrinsic elements and some simplification of the code would improve its performance on random data with minimal impact on its performance on largely sorted data.
I have implemented a fifth sorting routine, a translation of the Rust sorting routine to Fortran 2008. The Rust sort is also inspired by Tim Peters' Timsort algorithm, but drops the "galloping" used to rapidly find the minimum range to be sorted. It is released under the Apache License, Version 2.0 or the MIT license so the translation can be easily incorporated into the standard library. Rust sort is comparable to the simplified Tim Sort on random data, and the true Tim Sort on partially sorted data. I suggest that that a standard library sorting routines include at least two routines: one like Introsort for data that is expected to be random; and one like Rust sort for data that will often have presorted sections.
Type | Elements | Order | Method | Time (s) |
---|---|---|---|---|
Integer | 1048576 | Random order | QuickSort | 0.14655 |
Integer | 1048576 | Increasing | QuickSort | 0.04457 |
Integer | 1048576 | Decreasing | QuickSort | 0.12806 |
Integer | 1048576 | Random sparse | QuickSort | 0.14349 |
Integer | 1048576 | Random dense | QuickSort | 0.14277 |
Integer | 1048576 | Random order | Intro_Sort | 0.14568 |
Integer | 1048576 | Identical | Intro_Sort | 0.14389 |
Integer | 1048576 | Increasing | Intro_Sort | 0.05068 |
Integer | 1048576 | Decreasing | Intro_Sort | 0.13336 |
Integer | 1048576 | Random sparse | Intro_Sort | 0.15501 |
Integer | 1048576 | Random dense | Intro_Sort | 0.15246 |
Integer | 1048576 | Random 3 | Intro_Sort | 0.09962 |
Integer | 1048576 | Random 10 | Intro_Sort | 0.30596 |
Integer | 1048576 | Random order | Sim_Tim_Sort | 0.16140 |
Integer | 1048576 | Identical | Sim_Tim_Sort | 0.06535 |
Integer | 1048576 | Increasing | Sim_Tim_Sort | 0.06486 |
Integer | 1048576 | Decreasing | Sim_Tim_Sort | 0.10153 |
Integer | 1048576 | Random sparse | Sim_Tim_Sort | 0.15755 |
Integer | 1048576 | Random dense | Sim_Tim_Sort | 0.16222 |
Integer | 1048576 | Random 3 | Sim_Tim_Sort | 0.06694 |
Integer | 1048576 | Random 10 | Sim_Tim_Sort | 0.06540 |
Integer | 1048576 | Random order | Tim_Sort | 0.20426 |
Integer | 1048576 | Identical | Tim_Sort | 0.00261 |
Integer | 1048576 | Increasing | Tim_Sort | 0.00249 |
Integer | 1048576 | Decreasing | Tim_Sort | 0.00417 |
Integer | 1048576 | Random sparse | Tim_Sort | 0.20701 |
Integer | 1048576 | Random dense | Tim_Sort | 0.20557 |
Integer | 1048576 | Random 3 | Tim_Sort | 0.00495 |
Integer | 1048576 | Random 10 | Tim_Sort | 0.00391 |
Integer | 1048576 | Random order | Rust_Sort | 0.16567 |
Integer | 1048576 | Identical | Rust_Sort | 0.00204 |
Integer | 1048576 | Increasing | Rust_Sort | 0.00203 |
Integer | 1048576 | Decreasing | Rust_Sort | 0.00354 |
Integer | 1048576 | Random sparse | Rust_Sort | 0.16542 |
Integer | 1048576 | Random dense | Rust_Sort | 0.15886 |
Integer | 1048576 | Random 3 | Rust_Sort | 0.00818 |
Integer | 1048576 | Random 10 | Rust_Sort | 0.00412 |
I think having two or even three of these would be great additions to stdlib. The occurence of pre-sorted sections is typical in indexes of sparse matrices for discretized PDE problems using irregular meshes.
I just uploaded a copy of a hybrid sort (quick sort + insertion sort) by Houlsby & Sloan (both are professors of civil engineering) to a public repository: https://github.com/ivan-pi/hs-sort/. Since I could not figure out the journal licensing conditions, I have sent an email to the original author asking for permission to share the routines under a permissive license. For an array of 1048576 random ordered doubles, the timing is roughly 0.2 seconds, so on the same order as your routines.
hey what is it alll about, i have just started in open source and came across this, could you guide me ahead?
@MUzairS15 This page is an issue in a Git repository devoted to providing a Standard Library for Fortran, similar to standard libraries for C, C++, Java etc. This repository may be of interest to you if you are a Fortran programmer interested extending the language's capabilities. Most of the library code is in Fortran extended by the FYPP preprocessor. Most uses of FYPP in the library are rather simple and easy to learn. The Fortran code can be sophisticated and requires a good knowledge of most of Fortran 2008 other than coarrays. Some limited C++/C processing may also be present using C interoperability.
The issue this page addresses is providing sorting subroutines for Fortran. I have developed routines for sorting arrays of intrinsic type elements, other than character, and am exploring their incorporation into the standard library.
I have tentatively renamed the sorting routines: RUST_SORT
=> ORD_SORT
signifying its better performance on sorting ordered arrays, and INFO_SORT
=> UNORD_SORT
signifying its better performance on randomly ordered arrays. I have decided to extend the capabilities of the sorting routines by adding a new routine to generate an array of sorting indices, so that an array of a derived type can be indirectly sorted based on the sorting of an element of that derived type. As stable sorting may be important for derived types, I have baed this routine on ORD_SORT
and have tentatively named this subroutine ORD_SORTING
.
Testing of ORD_SORTING
immediately ran into a problem in that it generated a segmentation fault. I now believe that this fault was due to the small default stack size on Mac OS X. ORD_SORTING
i believe uses over three times the memory of ORD_SORT
, which in turn usually uses more memory than UNORD_SORTING
. Raising the stack size to over 65 Mbytes appears to have fixed the problem in sorting arrays of 2**20 elements. I have not tested it on larger arrays. In attempting to debug the segmentation fault, I have changed my compiler from gfortran 10.2 to ifort 18.03. I was pleased to find that performance with ifort as the compiler was about twice as fast as with gfortran. The ifort runtimes are
Type | Elements | Order | Method | Time (s) |
---|---|---|---|---|
Integer | 1048576 | Random order | Ord_Sort | 0.09484 |
Integer | 1048576 | Identical | Ord_Sort | 0.00066 |
Integer | 1048576 | Increasing | Ord_Sort | 0.00064 |
Integer | 1048576 | Decreasing | Ord_Sort | 0.00112 |
Integer | 1048576 | Random sparse | Ord_Sort | 0.09372 |
Integer | 1048576 | Random dense | Ord_Sort | 0.09464 |
Integer | 1048576 | Random 3 | Ord_Sort | 0.00317 |
Integer | 1048576 | Random 10 | Ord_Sort | 0.00141 |
Integer | 1048576 | Random order | Unord_Sort | 0.06738 |
Integer | 1048576 | Identical | Unord_Sort | 0.03666 |
Integer | 1048576 | Increasing | Unord_Sort | 0.01244 |
Integer | 1048576 | Decreasing | Unord_Sort | 0.03999 |
Integer | 1048576 | Random sparse | Unord_Sort | 0.06779 |
Integer | 1048576 | Random dense | Unord_Sort | 0.06999 |
Integer | 1048576 | Random 3 | Unord_Sort | 0.07607 |
Integer | 1048576 | Random 10 | Unord_Sort | 0.13818 |
Integer | 1048576 | Random order | Ord_Sorting | 0.11640 |
Integer | 1048576 | Identical | Ord_Sorting | 0.00322 |
Integer | 1048576 | Increasing | Ord_Sorting | 0.00309 |
Integer | 1048576 | Decreasing | Ord_Sorting | 0.00437 |
Integer | 1048576 | Random sparse | Ord_Sorting | 0.11340 |
Integer | 1048576 | Random dense | Ord_Sorting | 0.11387 |
Integer | 1048576 | Random 3 | Ord_Sorting | 0.00784 |
Integer | 1048576 | Random 10 | Ord_Sorting | 0.00415 |
A few more user wishes concerning sorting are in https://github.com/j3-fortran/fortran_proposals/issues/101.
@certik @ivan-pi @rweed @jacobwilliams @jvdp1
You have all expressed interest in having the standard library include a module for sorting data. I now have a reasonable draft of such a module and a testing program, but before it is pulled into the standard library the following issues should be addressed:
stdlib_sorting
the ideal name for the module;INT32
or INT64
;INT_SIZE
or something else;UNORD_SORT
, a subroutine optimized for sorting near random data,
B. ORD_SORT
, a subroutine optimized for sorting data that has been partially sorted, and
C. ORD_SORTING
, a subroutine designed to generate an array of indices that would sort the input array (this is intended to be used to sort an array of a derived type based on the values of a component of that type);ORD_SORTING
on at least one test case with an input array of 2**20
elements, but has no problems with the maximum stack limit of 65532 kbytes. This will cause problems in testing on other computers. Partial fixes for this problem include:
A. Setting the stack size larger than the default in running the tests,
B. Test using a test case of significantly fewer than 2**20
elements,
C. Setting the integer type used for indexing to INT32
from INT64
reducing memory usage in ORD_SORTING
by about one third,
D. Set the output INDEX
array for ORD_SORTING
to INTENT(INOUT)
, rather than an allocatable INTENT(OUT)
so that the INDEX
array can be allocated outside of the routine in static memory rather than on the stack, reducing memory on the stack by up to a factor of two, or
E. Set the input ARRAY
for ORD_SORTING
to INTENT(INOUT)
rather than the current INTENT(IN)
which necessitates allocating a temporary array to hold the values of ARRAY
during sorting.ORD_SORT
and ORD_SORTING
allocate memory and call ERROR STOP
if allocation fails instead of returning an error flag. Several things of minor note:
UNORD_SORT
.UNORD_SORT
is about 40% faster than ORD_SORT
on near random data.ORD_SORT
is several orders of magnitude faster than UNORD_SORT
on data with large sections of pre-sorted data.The current draft specification document follows:
stdlib_sorting
module(TOC)
The sorting of collections of data is useful in the analysis of the
collections.
With its absence of generics and limited polymorphism, it is
impractical in current Fortran to provide sorting routines for
arbitrary collections of arbitrary types of data.
However Fortran's arrays are by far its most widely used collection,
and arrays of arbitrary types of data can often be sorted in terms of
a single component of intrinsic type.
The Fortran Standard Library therefore provides a module,
stdlib_sorting
with procedures to sort arrays of single intrinsic
numeric types, i.e. the different kinds of integers and reals.
The module stdlib_sorting
defines several public entities one
default integer parameter, int_size
, and three overloaded
subroutines: ORD_SORT
, UNORD_SORT
, and ORD_SORTING
. The
overloaded subroutines also each have seven specific names for
versions with differend types of array. arguments.
int_size
parameterThe int_size
parameter is used to specify the kind of integer used
in indexing the various arrays. Currently the module sets int_size
to the value of int64
from the intrinsic ISO_FORTRAN_ENV
module.
For many applications a value of INT32
would be sufficient for
addressing and would save some stack space for the subroutines,
The stdlib_sorting
module provides three different overloaded
subroutines intended to sort three different kinds of arrays of
data:
UNORD_SORT
is intended to sort simple arrays of intrinsic data
that is effectively unordered before the sort;ORD_SORT
is intended to sort simple arrays of intrinsic data
that has significant sections that were partially ordered before the
sort; andORD_SORTING
is intended to provide indices for sorting arrays of
derived type data, based on the ordering of an intrinsic component
of the derived type.UNORD_SORT
subroutinesUNORD_SORT
uses the [introsort
sorting algorithm of David Musser]
(http://www.cs.rpi.edu/~musser/gp/introsort.ps). introsort
is a hybrid
unstable comparison algorithm combining quicksort
, insertionsort
, and
heapsort
. While this algorithm is always O(N Ln(N)) it is relatively
fast on randomly ordered data, but inconsistent in performance on partly
sorted data. David Musser has given permission to include a variant of
introsort
in the Fortran Standard Library under the MIT license provided
we cite:
Musser, D.R., “Introspective Sorting and Selection Algorithms,” Software—Practice and Experience, Vol. 27(8), 983–993 (August 1997).
as the official source of the algorithm.
As with introsort
, UNORD_SORT
is an unstable hybrid
algorithm using insertion sort
for the outer branches of the sort tree,
by default using quicksort
for the main body of the sort tree, but if
quicksort
is converging too slowly the algorithm resorts
to heap sort
. The algorithm is of order O(N Ln(N)) for all inputs.
Because it relies on quicksort
, the coefficient of the O(N Ln(N))
behavior is typically small compared to other sorting algorithms.
The algorithm shows relatively little performance difference (factors
of two to four) between arrays of "random" data and arrays with
partially sorted data.
ORD_SORT
subroutineORD_SORT
is a translation of the [rust sort
sorting algorithm]
(https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs)
which in turn is inspired by the [timsort
algorithm of Tim Peters]
(http://svn.python.org/projects/python/trunk/Objects/listsort.txt).
ORD_SORT
is a hybrid stable comparison algorithm combining mergesort
,
and insertion sort
. It is always at worst O(N Ln(N)) in sorting random
data, having a performance about 15-40% slower than UNORD_SORT
on random
data, but has much better performance than UNORD_SORT
on partially
sorted data, having O(N) performance on uniformly increasing or
decreasing data. The
rust sort
implementation
is distributed with the header:
Copyright 2012-2015 The Rust Project Developers. See the COPYRIGHT
file at the top-level directory of this distribution and at
http://rust-lang.org/COPYRIGHT.
Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
<LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
option. This file may not be copied, modified, or distributed
except according to those terms.
so the license for the original code is compatible with the use of modified versions of the code in the Fortran Standard Library under the MIT license.
As with timsort
, ORD_SORT
is a stable hybrid algorithm.
It begins by traversing the array starting in its tail attempting to
identify runs
in the array, where a run is either a uniformly
decreasing sequence, ARRAY(i-1) > ARRAY(i)
, or non-decreasing,
ARRAy(i-1) <= ARRAY(i)
, sequence. Decreasing sequences are reversed
then, if the sequence has less than MIN_RUN
elements, previous
elements in the array are added to the run using insertion sort
until the run contains MIN_RUN
elements or the array is completely
processed. As each run is identified start and length of the run is
then pushed onto a stack and the stack is then processed using merge
until it obeys the stack invariants:
ensuring that processing the stack is, at worst, of order O(N Ln(N))
. However, because of the identification of decreasing and
non-decreasing runs, processing of structured data can be much faster,
with processing of uniformly decreasing or non-decreasing arrays being
of order O(N). The result in our tests is that ORD_SORT
is about
15-40% slower than UNORD_SORT
on purely random data, depending on
the compiler, but can be more than an order of magnitude faster than
UNORD_SORT
on highly structured data. ORD_SORT
allocates a
temporary array with half the elements of the input array, that can
cause problems with stack limits for large input arrays.
ORD_SORTING
subroutineThe UNORD_SORT
and ORD_SORT
subroutines can sort isolated arrays
of intrinsic types, but do nothing for the sorting of arrays of
derived types. For arrays of derived types what is useful is an array
of indices that maps the original array to an array sorted based on the
value of a component of the derived type. For such a sort a stable
sort is useful, therefore the module provides a subroutine,
ORD_SORTING
, that generates such an array of such indices based on
the ORD_SORT
algorithm. As ORD_SORTING
is also based on the
rust sort
algorithm the rust sort
license must be acknowledged:
Copyright 2012-2015 The Rust Project Developers. See the COPYRIGHT
file at the top-level directory of this distribution and at
http://rust-lang.org/COPYRIGHT.
Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
<LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
option. This file may not be copied, modified, or distributed
except according to those terms.
noting that the Fortran Standard Library is released under the MIT
license so that incorporation of the rust sort
algorithm is
compatible with its license.
The logic of ORD_SORTING
parallels that of ORD_SORT
, with
additional housekeeping to keep the array of indices consistent with
the sorted positions of the input array. Because of this additional
housekeeping it has slower performance than the other two
routines. Because it uses several temporary arrays, it is also more
sensitive to stack limits than the other two routines.
stdlib_sorting
proceduresunord_sort
- sorts an input arrayExperimental
Returns an input array with the elements sorted in order of increasing value.
call unord_sort ( array )
Pure generic subroutine.
array
: shall be a rank one array of any of the types:
integer(int8)
, integer(int16)
, integer(int32)
, integer(int64)
,
real(real32)
, real(real64)
, or real(real128)
. It
is an intent(inout)
argument. On return its input elements will be
sorted in order of non-decreasing value.
UNORD_SORT
implements a hybrid sorting algorithm combining
quicksort
, merge sort
, and insertion sort
. For most purposes it
behaves like a quicksort
with a median of three partition, providing
good O(N Ln(N))
run time performance for most random arrays, but
defaulting to merge sort
if the structure of the array results in a
prediction of poor runtime performance.
ord_sort
- sorts an input arrayExperimental
Returns an input array with the elements sorted in order of increasing value.
call ord_sort ( array )
Generic subroutine.
array
: shall be a rank one array of any of the types:
integer(int8)
, integer(int16)
, integer(int32)
, integer(int64)
,
real(real32)
, real(real64)
, or real(real128)
. It
is an intent(inout)
argument. On return its input elements will be
sorted in order of non-decreasing value.
ORD_SORT
implements a hybrid sorting algorithm combining
heapsort
, and insertion sort
. For most purposes it behaves like a
merge sort
, providing worst case O(N Ln(N))
run time performance
for most random arrays, that is typically slower than UNORD_SORT
.
However, if the array has significant runs of decreasing or
non-decreasing values performance can be much better than
UNORD_SORT
, with O(N)
behavior on uniformly decreasing, or
non-decreasing arrays.
ord_sorting
- creates an arry of sorting indices for an input array.Experimental
Returns an integer array whose elements would sort the input array in the specified direction retaining order stability.
call unord_sort ( array, index[, reverse, zero_based )
Subroutine.
array
: shall be a rank one array of any of the types:
integer(int8)
, integer(int16)
, integer(int32)
, integer(int64)
,
real(real32)
, real(real64)
, or real(real128)
. It is an
intent(in)
argument. It will be an array whose sorting
indices are to be determined.
index
: shall be a rank one allocatable integer array of kind
int_size
. It is an intent(out)
argument. On return it shall be
of size(array)
with values that shall be the indices needed to sort
array
in the desired direction.
reverse
(optional): shall be a scalar of type default logical. It
is an intent(in)
argument. If present with a value of .true.
then
the indices in index
will sort array
in order of non-increasing
values in stable order. Otherwise the indices will sort array
in
order of non-decreasing values in stable order.
zero_based
(optional): shall be a scalar of type default logical.
It is an intent(in)
argument. If present with the value .true.
the values of index
will be for zero based array indexing,
otherwise the values of index will be for one's based array indexing.
ORD_SORTING
implements the hybrid sorting algorithm of ORD_SORT
.
As a merge sort
based algorithm it is a stable sorting comparison
algorithm. It uses several temporary arrays that can cause problems
with stack limits.
I was wondering if a work array
can be implemented for this (work array will decrease the number of temporary arrays): https://en.wikipedia.org/wiki/Merge_sort#Top-down_implementation
Work arrays would certainly help stack memory usage of ORD_SORT
and ORD_SORTING
(which requires more than one work array). If I resort to minimizing memory usage then their APIs would be:
ord_sort
- sorts an input arrayExperimental
Returns an input array with the elements sorted in order of increasing value.
call ord_sort ( array, work )
Generic subroutine.
array
: shall be a rank one array of any of the types:
integer(int8)
, integer(int16)
, integer(int32)
, integer(int64)
,
real(real32)
, real(real64)
, or real(real128)
. It
is an intent(inout)
argument. On input it is the array to be
sorted. On return its elements will be
sorted in order of non-decreasing value.
work
(optional): shall be a rank one array of any of the same type as array
,
and shall have at least size(array)/2
elements. It
is an intent(out)
argument. Its contents on return are undefined.
ORD_SORT
implements a hybrid sorting algorithm combining
heapsort
, and insertion sort
. For most purposes it behaves like a
merge sort
, providing worst case O(N Ln(N))
run time performance
for most random arrays, that is typically slower than UNORD_SORT
.
However, if the array has significant runs of decreasing or
non-decreasing values performance can be much better than
UNORD_SORT
, with O(N)
behavior on uniformly decreasing, or
non-decreasing arrays. If array
is of any type REAL
the behavior of
the sorting is undefined if any element of array
is a NaN
. The optional
work
array replaces "scratch" memory that would otherwise be
allocated on the stack.
ord_sorting
- creates an arry of sorting indices for an input array.Experimental
Returns an integer array whose elements would sort the input array in the specified direction retaining order stability.
call ord_sorting ( array, index[, work, iwork, reverse, zero_based )
Subroutine.
array
: shall be a rank one array of any of the types:
integer(int8)
, integer(int16)
, integer(int32)
, integer(int64)
,
real(real32)
, real(real64)
, or real(real128)
. It is an
intent(inout)
argument. On input it will be an array whose sorting
indices are to be determined. On output it will be the sorted
array.
index
: shall be a rank one allocatable integer array of kind
int_size
of the same size as array
. It is an intent(inout)
argument.
On return it shall have values that are the indices needed to sort
the original array
in the desired direction.
work
(optional): shall be a rank one array of the same type as array
,
and shall have at least size(array)/2
elements. It
is an intent(out)
argument. Its contents on return are undefined.
iwork
(optional): shall be a rank one integer array of kind int_size
,
and shall have at least size(array)/2
elements. It
is an intent(out)
argument. Its contents on return are undefined.
reverse
(optional): shall be a scalar of type default logical. It
is an intent(in)
argument. If present with a value of .true.
then
the indices in index
will sort array
in order of non-increasing
values in stable order. Otherwise the indices will sort array
in
order of non-decreasing values in stable order.
zero_based
(optional): shall be a scalar of type default logical.
It is an intent(in)
argument. If present with the value .true.
the values of index
will be for zero based array indexing,
otherwise the values of index will be for one's based array indexing.
ORD_SORTING
implements the hybrid sorting algorithm of ORD_SORT
.
As a merge sort
based algorithm it is a stable sorting comparison
algorithm. If array
is of any type REAL
the behavior of
the sorting is undefined if any element of array
is a NaN
. The optional
work
and iwork
arrays replace "scratch" memory that would otherwise be
allocated on the stack.
I have implemented optional work arrays and the current testing code now does not run out of stack space when processing 2**20
element arrays with ORD_SORTING
with the default Mac OS X maximum stack size of 8192 kbytes. However I discovered that my testing code was allocating space on the stack for expressions such as ARRAY(INDEX(:))
, and those expressions had to be assigned to static storage before the stack problems were fully fixed.
@wclodius2 Thank you for this impressive work. Sorting algorithms are highly needed for multiple procedures in stdlib
(e.g., #337, sorting arrays of sparse matrices, ...). Therefore I believe it would be a really nice addition.
To answer to (some of) your questions in a previous comment:
Is stdlib_sorting the ideal name for the module;
OK for me
Should the integer type used for indexing be INT32 or INT64;
If there is no size limit on the size of the array to be sorted, then I am in favor of int64
(I got too often seg faults with int32
in such algorithms).
What is the best name for the following generic subroutines identified by their tentative names: A. UNORD_SORT, a subroutine optimized for sorting near random data, B. ORD_SORT, a subroutine optimized for sorting data that has been partially sorted, and C. ORD_SORTING, a subroutine designed to generate an array of indices that would sort the input array (this is intended to be used to sort an array of a derived type based on the values of a component of that type);
These names are fine for me, and are helpful for users who don't care about the sorting algorithm but want the best algorithm for sorting their array.
However, for other users, would it be also possible to provide the procedrues with names that refer to their algorithm (e.g., merge_sort
, insertion_sort
)?
@wclodius2 Thank you for this impressive work. Sorting algorithms are highly needed for multiple procedures in
stdlib
(e.g., #337, sorting arrays of sparse matrices, ...). Therefore I believe it would be a really nice addition.snip
Should the integer type used for indexing be INT32 or INT64;
If there is no size limit on the size of the array to be sorted, then I am in favor of
int64
(I got too often seg faults withint32
in such algorithms).
There are no obvious limits on the size of the arrays that can be processed. Maybe more thorough testing would find some. For example, UNORD_SORT
with its recursion would exceed maximum stack space in processing a sufficiently large array. There might be limits on the static storage required to allow ORD_SORT
and ORD_SORTING
to process large arrays.
What is the best name for the following generic subroutines identified by their tentative names: A. UNORD_SORT, a subroutine optimized for sorting near random data, B. ORD_SORT, a subroutine optimized for sorting data that has been partially sorted, and C. ORD_SORTING, a subroutine designed to generate an array of indices that would sort the input array (this is intended to be used to sort an array of a derived type based on the values of a component of that type);
These names are fine for me, and are helpful for users who don't care about the sorting algorithm but want the best algorithm for sorting their array. However, for other users, would it be also possible to provide the procedrues with names that refer to their algorithm (e.g.,
merge_sort
,insertion_sort
)?
The current procedures are hybrid sorts, and most clearly identified by names that would be meaningless to most users.
ORD_SORT
and ORD_SORTING
are based on the rust sort
of the Rust Language which is a simplified version of the timsort
of Tim Peters. The core of the algorithm is an iterative (not recursive) merge sort
. The algorithm examines the next "few" elements of the array for a run of uniformly decreasing values, or a run of uniformly non-decreasing values. If the run is uniformly decreasing then the order of the elements are reversed. If the resulting run is too short, it is padded to the desired minimum size and an insertion sort is performed on the run. Then the run is compared with previously identified runs to see if any recent runs should be merged, with an algorithm that ensures at worst O(N Ln(N))
runtime performance.
UNORD_SORT
is based on the introsort
(short for introspective sort
) of David Musser. It combines three sorting algorithms: quicksort
, heap sort
, and insertion sort
. First it examines the array and estimates the depth of recursion a quick sort would require for ideal (random) data, D = Ln(N)/Ln(2)
. It then defines a limit to the number of quicksort
recursions to be allowed in processing, D_limit = factor * D
, where factor is typically taken to be 2, and calls introsort
proper. 'introsort` proper then:
insertion sort
and returns;D_limit
and, if it does, processes the remaining elements with heap sort
and returns;D_limit
, then in effect do a quicksort
step:
introsort
proper on the leftmost partition,introsort
proper on the rightmost partition, and then returns.thanks for the detailed explanations. Based on these, the proposed names make sense to me.
I have started a pull request on the stdlib_sorting
module so far focussed on the markdown document, stdlib_sorting.md
. I would like some feedback on the overall API, before including the module proper and the testing code for approval.
Should stdlib also provide an interface to C's qsort function?
The motivation is flexibility -- it allows sorting arbitrary derived types with arbitrary user-defined comparison functions. The downside is the user needs to provide a comparison function, and the call requires using some iso_c_binding features. To me this makes it somewhat "non-fortranic", and I prefer to use alternatives such as those suggested above, if possible. But for "very general" cases, I don't know a better technique than using C's qsort.
If we were to do this, the sorting module would need to have an interface to C's qsort:
interface
subroutine qsort_C(array, elem_count, elem_size, compare) bind(C,name="qsort")
use iso_c_binding, only: c_ptr, c_size_t, c_funptr
implicit none
type(c_ptr), value :: array ! C-pointer to the first entry of the array
integer(c_size_t), value :: elem_count ! Number of elements in the array
integer(c_size_t), value :: elem_size ! Size of each element, according to c_sizeof()
type(c_funptr), value :: compare ! c_funptr to the user-provided comparison function
end subroutine qsort_C
end interface
To use it, the user would need to provide a function compare
that takes 2 entries of array
and returns -1, 0, or 1 depending on their order. For instance if we are sorting integers (edited on 18th April to make arguments of type c_ptr):
function test_integer_compare(i1ptr, i2ptr) result(sgn) bind(C)
type(c_ptr), value, intent(in) :: i1ptr, i2ptr
integer, pointer :: i1, i2 ! In more general cases these would be of type(my_derived_type)
integer(c_int) :: sgn
call c_f_pointer(i1ptr, i1)
call c_f_pointer(i2ptr, i2)
! The user defines what 'less than', 'equal', 'greater than' means by setting 'sgn'
if(i1 < i2) sgn = -1_c_int
if(i1 == i2) sgn = 0_c_int
if(i1 > i2) sgn = 1_c_int
end function
The call to qsort_C is a bit ugly because one needs to make use of c_loc
and c_funloc
and c_sizeof
.
subroutine test_qsort_C
integer, target :: array(5) ! In more general cases this could be of type(my_derived_type)
! To be sorted
array = [4, 3, 5, 1, 2]
! In-place sort of the array
call qsort_C( &
c_loc(array(1)), &
size(array, kind=c_size_t), &
c_sizeof(array(1)), &
c_funloc(test_integer_compare) )
if(all(array == [1, 2, 3, 4, 5])) then
print*, 'PASS'
else
print*, 'FAIL'
end if
end subroutine
My sense is that stdlib should provide something with this level of generality. Is there any better approach?
I now have three requested changes in the API. Let me know if you have strong preferences.
ORD_SORTING
. Possibilities include:
ORD_SORT_INDEX
ORD_ARGSORT
ORD_INDEX
SORT_INDEX
REVERSE
subroutineqsort
. This would allow direct sorting of arrays of a derived type.Upon looking at the fortran standard in more detail, I've become uncertain about the correct use of C's qsort. I've raised a question here to clarify.
Based on the fortran discourse discussion I have corrected the qsort_C
example above. The key point is that the comparison function should take input arguments of type(c_ptr), value, intent(in)
, and then do type casting. This will work for an arbitrary user-specified type. It will also work for intrinsic types. Interestingly it won't work for sorting character(len=*)
-- for characters we need to specify len
.
@gareth-nx , @jvdp1 and I are having a discussion of my results and its possible implications for the naming conventions of the stdlib_sorting
module's API on #386 .
@ivan-pi on #386 has expressed concerns that the implemented sortings for character(*)
arrays and string_type
arrays (using the comparison operators and not the lexical comparison functions) may not match users needs and might be best removed from the API and my code.
Also note that the results of the testings may be of interest to @certik for his Fortran benchmarks project.
I have committed and pushed my source code to #386 . The testing harness found a couple of errors that didn't show up with my compilers on my computer. I have committed and pushed the fixed codes. The testing harness for the fixed codes does not report any further errors.
@gareth-nx Do you mind emailing me at milancurcic@hey.com? I can't find your contact info. I apologize for the off-topic message in this PR, but I can't find a better way on GitHub. :)
I have started up a new branch at https://github.com/fortran-lang/stdlib/pull/408 to deal with some merge issues.
At the very least there needs to be some quick algorithm for sorting integers and reals. But I think we can implement several algorithms and the user can choose.
This will also be needed to implement #38.