certik commented 4 years ago

At the very least there needs to be some quick algorithm for sorting integers and reals. But I think we can implement several algorithms and the user can choose.

This will also be needed to implement #38.

jvdp1 commented 4 years ago

Is it also an option to link to the algorithms provided by BLAS/LAPACK?

certik commented 4 years ago

@jvdp1 I think so. We'll need to depend on Lapack for #10 anyway.

certik commented 4 years ago

Here are inefficient implementations from fortran-utils: https://github.com/certik/fortran-utils/blob/b43bd24cd421509a5bc6d3b9c3eeae8ce856ed88/src/sorting.f90, the API however might be useful to get inspiration from. Here is more efficient implementation of quicksort: https://github.com/certik/hfsolver/blob/b4c50c1979fb7e468b1852b144ba756f5a51788d/src/sorting.f90#L165. I think there will be much better implementations out there, including in Lapack.

jvdp1 commented 4 years ago

Here is a mutli-threaded one: https://balsoftware.net/index.php/open-source/multi-thread-sort/ Not sure what is the license.

rweed commented 4 years ago

The following routines implement the version of quicksort outlined in Hanson's and Hopkins book and on the associated web site (http://www.siam.org/books/ot134). The publishers license is very permissive (at least as I read it). This version supports REAL32, REAL64, INT8, INT16, INT32, INT64, Character strings and arrays, and a user defined type. Unfortunately, the file tool for this page does not know anything about .f90 or .F90 file extensions. Here is the code:

Main quicksort module: type specific routines are overloaded with generic qsort interface. Sorting can be in place or return a integer permutation array leaving original array untouched. Sorting can be in either ascending or descending order.

quickSort.F90.txt

A base user class that can be extended to define user specific classes. Specifies dummy methods for relational operators needed for sorting etc, a print method and an assignment method.

userClass.F90.txt

A test program that tests sorting integers, reals, characters, and a user point class that is sorted on distance (Euclidean Norm)

testqsort.f90.txt

The point class used in test program. It is extended from the base User class

point.f90.txt

jacobwilliams commented 4 years ago

Here are some of mine:

https://github.com/jacobwilliams/fortran-search-and-sort
https://github.com/jacobwilliams/stringsort (see also here about natural sorting of strings).

ivan-pi commented 3 years ago

A few more:

Sorting routines in various libraries

ACM
- Algorithm 402 (version linked here is part of CLAWPACK)
HSL
- KB07
IBM Engineering and Scientific Subroutine Library (ESSL)
IMSL
- SORT_REAL
- SVRGN
- SVRGP
- SVIGN
- SVIGP
- SVRBN
- SVRBP
- SVIBN
- SVIBP
LAPACK
- slasrt
- dlasrt
NAG
- Searching and sorting
Numerical Recipes
ORDERPACK
SLATEC (search for the routines ipsort, isort, spsort, ssort, dpsort, dsort, and hpsort)

Various user routines

FlashSort (also see article in Dr.Dobbs or the Wikipedia page)
Sorting in Fortran by Michael Rutter
sorting programs in Fortran by Jean-Pierre Moreau
Sorting by John Mahaffy
FORTRAN Bubble Sort Speghetti Code by Thomas W Bennet
MULTI-THREADED SORT by Bal Software
quicksort by Tim Hoar
lcpsort - Module for sorting and merging ASCII text strings by N. Shamsundar, University of Houston (a C version by the same author is provided here)
qsort_inline by Joseph M. Krahn at FortranWiki
Coding Fortran: a sorting module by Michael Wirth
Parsing and Sorting with Fortran by Mike Rourke
quicksort by @t-nissie and a modified version by @1AdAstra1
Selection sort by C.-K. Shene
LexicalSort by unknown author from comp.lang.fortran newsgroup
Ancora sul sort di ‘Compilati vs. Interpretati’ – 2; blog post comparing various Fortran sort implementations (in Italian!)
M_sort by @urbanjost
simple_fortran_argsort by @Astrokiwi
Bubble Sort (WikiBooks)
Gnome sort (Rosetta Code)
Insertion sort (Rosetta Code)

Internet forums and discussion boards

ordering function in Fortran (comp.lang.fortran)
Reverse quick sort in Fortran (StackExchange)
Sorting an array from lowest to greatest FORTRAN (StackExchange)
Fortran Bubble Sort Algorithm (StackExchange)
Open-source version of Intel Fortran sort routines (StackExchange) provided by Rafik Zurob

Interfaces to C `qsort`

Published works

Books
- Algorithms and Data Structures in F and Fortran by Robin Vowels (see chapter 1)
- Introduction to Programming with Fortran by Ian Chivers and Jane Sleightholme (see pp. 703-732)
- Fortran 90 & 95 Array and Pointer Techniques (eBook) by Loren P. Meissner (see chapters 2 and 3)
Articles
- Transferable FORTRAN Subroutine for Rapid Extended Sorting
- Technical Note: Efficient sorting routines in FORTRAN 77

wclodius2 commented 3 years ago

FWIW I needed a sorting procedure to sort the results of of hashing functions to estimate their tendency to produce conflicting hashes. I have implemented four sorting algorithms for sorting rank one arrays of kinds INT8, INT16, INT32, INT64, SP, DP, and QP.

A hybrid of quicksort using median of three for the pivot and insertion sort for the leaves. There should be no problem with distributing this algorithm with stdlib, but I have my issues with its performance on some arrays
A version of the introsort of David Musser used as the standard sorting for some C++ standard libraries. This is a hybrid algorithm using insertion sort for the leaves of the sort tree, quicksort by default for the main body, and heapsort if quicksort is not converging quickly. I have permission from David Musser to incorporate my version of introsort into the standard library.
A algorithm inspired by timsort of Tim Peters used in Python, Java, and other languages. While it claims to be timsort it is not, but has very good performance for a mergesort base sort, particularly on partly sorted data. It's sorting is not in place and stable. It requires temporary space of size N. In particular, it doesn't do the search for runs or the galloping of the original. I call this algorithm simplified timsort. I have asked the author of this algorithm, Aditya Kumar, for permission to use it.
A version of timsort largely a translation of the original C code to Fortran 2008. This code has at least one error discussed by N. Auger, C. Nicaud, and C. Pivoteau. Merge Strategies: from Merge Sort to TimSort. 2015. hal-01212839v2. that is fixed in my translation. I have written to two old email addresses of Tim Peters asking for permission to use my translation in stdlib, but have not heard back.

I have implemented eight sets of arrays for testing the codes, all of length 2**20.

Random - a set of integers from 0 to 2**20 - 1 in random order.
Identical - all integers have the same value of 10.
Increasing - values increase uniformly from 0 to 2**20-1.
Decreasing - values decrease uniformly from 2**20-1 to 0.
Random-sparse - the integers are generated randomly from a set of values from 0 to 2**22-1 so duplicates are sparse.
Random-dense - the integers are generated randomly from a set of values from 0 to 2**19-1 so duplicates are dense.
Random-3 - the Increasing array has 3 random exchanges of individual elements.
Random-10 - the final ten elements of the increasing array are replaced by random values.

The quicksort array sometimes exhibits quadratic behavior on the Identical, Randon-3, and Random-10 arrays so I turned off testing for this sort on these three arrays. The results of the testing are summarized by the following table. The times are averages of eight consecutive runs.

Type	Elements	Order	Method	Time (s)
Integer	1048576	Random order	QuickSort	0.14268
Integer	1048576	Increasing	QuickSort	0.04419
Integer	1048576	Decreasing	QuickSort	0.13219
Integer	1048576	Random sparse	QuickSort	0.14551
Integer	1048576	Random dense	QuickSort	0.14235
Integer	1048576	Random order	Intro_Sort	0.14291
Integer	1048576	Identical	Intro_Sort	0.14412
Integer	1048576	Increasing	Intro_Sort	0.04812
Integer	1048576	Decreasing	Intro_Sort	0.12334
Integer	1048576	Random sparse	Intro_Sort	0.14504
Integer	1048576	Random dense	Intro_Sort	0.14413
Integer	1048576	Random 3	Intro_Sort	0.10890
Integer	1048576	Random 10	Intro_Sort	0.34179
Integer	1048576	Random order	Sim_Tim_Sort	0.16737
Integer	1048576	Identical	Sim_Tim_Sort	0.06190
Integer	1048576	Increasing	Sim_Tim_Sort	0.06276
Integer	1048576	Decreasing	Sim_Tim_Sort	0.09906
Integer	1048576	Random sparse	Sim_Tim_Sort	0.16657
Integer	1048576	Random dense	Sim_Tim_Sort	0.16756
Integer	1048576	Random 3	Sim_Tim_Sort	0.06303
Integer	1048576	Random 10	Sim_Tim_Sort	0.06194
Integer	1048576	Random order	Tim_Sort	0.22234
Integer	1048576	Identical	Tim_Sort	0.00234
Integer	1048576	Increasing	Tim_Sort	0.00232
Integer	1048576	Decreasing	Tim_Sort	0.00378
Integer	1048576	Random sparse	Tim_Sort	0.20528
Integer	1048576	Random dense	Tim_Sort	0.20005
Integer	1048576	Random 3	Tim_Sort	0.00353
Integer	1048576	Random 10	Tim_Sort	0.00507

In summary,

quicksort has good performance where it does not have quadratic behavior, and has particularly good performance on the uniformly increasing array and slightly better performance on the uniformly decreasing array.
introsort roughly matches quicksort performance, except that it is stable where quicksort has quadratic behavior.
Simplified timsort has about 20% slower performance on random data, but consistently very good at data with significant stretches of ordered data.
Timsort has about 80% slower performance than introsort on random data, but extraordinary performance, essentially O(N), on data with significant stretches of ordered data.

FWIW timsort puts a lot of effort into reducing the number of comparisons of elements of the array/list at the expense of some convoluted logic. I suspect this effort is wasted on intrinsic elements and some simplification of the code would improve its performance on random data with minimal impact on its performance on largely sorted data.

wclodius2 commented 3 years ago

I have implemented a fifth sorting routine, a translation of the Rust sorting routine to Fortran 2008. The Rust sort is also inspired by Tim Peters' Timsort algorithm, but drops the "galloping" used to rapidly find the minimum range to be sorted. It is released under the Apache License, Version 2.0 or the MIT license so the translation can be easily incorporated into the standard library. Rust sort is comparable to the simplified Tim Sort on random data, and the true Tim Sort on partially sorted data. I suggest that that a standard library sorting routines include at least two routines: one like Introsort for data that is expected to be random; and one like Rust sort for data that will often have presorted sections.

Type	Elements	Order	Method	Time (s)
Integer	1048576	Random order	QuickSort	0.14655
Integer	1048576	Increasing	QuickSort	0.04457
Integer	1048576	Decreasing	QuickSort	0.12806
Integer	1048576	Random sparse	QuickSort	0.14349
Integer	1048576	Random dense	QuickSort	0.14277
Integer	1048576	Random order	Intro_Sort	0.14568
Integer	1048576	Identical	Intro_Sort	0.14389
Integer	1048576	Increasing	Intro_Sort	0.05068
Integer	1048576	Decreasing	Intro_Sort	0.13336
Integer	1048576	Random sparse	Intro_Sort	0.15501
Integer	1048576	Random dense	Intro_Sort	0.15246
Integer	1048576	Random 3	Intro_Sort	0.09962
Integer	1048576	Random 10	Intro_Sort	0.30596
Integer	1048576	Random order	Sim_Tim_Sort	0.16140
Integer	1048576	Identical	Sim_Tim_Sort	0.06535
Integer	1048576	Increasing	Sim_Tim_Sort	0.06486
Integer	1048576	Decreasing	Sim_Tim_Sort	0.10153
Integer	1048576	Random sparse	Sim_Tim_Sort	0.15755
Integer	1048576	Random dense	Sim_Tim_Sort	0.16222
Integer	1048576	Random 3	Sim_Tim_Sort	0.06694
Integer	1048576	Random 10	Sim_Tim_Sort	0.06540
Integer	1048576	Random order	Tim_Sort	0.20426
Integer	1048576	Identical	Tim_Sort	0.00261
Integer	1048576	Increasing	Tim_Sort	0.00249
Integer	1048576	Decreasing	Tim_Sort	0.00417
Integer	1048576	Random sparse	Tim_Sort	0.20701
Integer	1048576	Random dense	Tim_Sort	0.20557
Integer	1048576	Random 3	Tim_Sort	0.00495
Integer	1048576	Random 10	Tim_Sort	0.00391
Integer	1048576	Random order	Rust_Sort	0.16567
Integer	1048576	Identical	Rust_Sort	0.00204
Integer	1048576	Increasing	Rust_Sort	0.00203
Integer	1048576	Decreasing	Rust_Sort	0.00354
Integer	1048576	Random sparse	Rust_Sort	0.16542
Integer	1048576	Random dense	Rust_Sort	0.15886
Integer	1048576	Random 3	Rust_Sort	0.00818
Integer	1048576	Random 10	Rust_Sort	0.00412

ivan-pi commented 3 years ago

I think having two or even three of these would be great additions to stdlib. The occurence of pre-sorted sections is typical in indexes of sparse matrices for discretized PDE problems using irregular meshes.

I just uploaded a copy of a hybrid sort (quick sort + insertion sort) by Houlsby & Sloan (both are professors of civil engineering) to a public repository: https://github.com/ivan-pi/hs-sort/. Since I could not figure out the journal licensing conditions, I have sent an email to the original author asking for permission to share the routines under a permissive license. For an array of 1048576 random ordered doubles, the timing is roughly 0.2 seconds, so on the same order as your routines.

MUzairS15 commented 3 years ago

hey what is it alll about, i have just started in open source and came across this, could you guide me ahead?

wclodius2 commented 3 years ago

@MUzairS15 This page is an issue in a Git repository devoted to providing a Standard Library for Fortran, similar to standard libraries for C, C++, Java etc. This repository may be of interest to you if you are a Fortran programmer interested extending the language's capabilities. Most of the library code is in Fortran extended by the FYPP preprocessor. Most uses of FYPP in the library are rather simple and easy to learn. The Fortran code can be sophisticated and requires a good knowledge of most of Fortran 2008 other than coarrays. Some limited C++/C processing may also be present using C interoperability.

The issue this page addresses is providing sorting subroutines for Fortran. I have developed routines for sorting arrays of intrinsic type elements, other than character, and am exploring their incorporation into the standard library.

wclodius2 commented 3 years ago

I have tentatively renamed the sorting routines: RUST_SORT => ORD_SORT signifying its better performance on sorting ordered arrays, and INFO_SORT => UNORD_SORT signifying its better performance on randomly ordered arrays. I have decided to extend the capabilities of the sorting routines by adding a new routine to generate an array of sorting indices, so that an array of a derived type can be indirectly sorted based on the sorting of an element of that derived type. As stable sorting may be important for derived types, I have baed this routine on ORD_SORT and have tentatively named this subroutine ORD_SORTING.

Testing of ORD_SORTING immediately ran into a problem in that it generated a segmentation fault. I now believe that this fault was due to the small default stack size on Mac OS X. ORD_SORTING i believe uses over three times the memory of ORD_SORT, which in turn usually uses more memory than UNORD_SORTING. Raising the stack size to over 65 Mbytes appears to have fixed the problem in sorting arrays of 2**20 elements. I have not tested it on larger arrays. In attempting to debug the segmentation fault, I have changed my compiler from gfortran 10.2 to ifort 18.03. I was pleased to find that performance with ifort as the compiler was about twice as fast as with gfortran. The ifort runtimes are

Type	Elements	Order	Method	Time (s)
Integer	1048576	Random order	Ord_Sort	0.09484
Integer	1048576	Identical	Ord_Sort	0.00066
Integer	1048576	Increasing	Ord_Sort	0.00064
Integer	1048576	Decreasing	Ord_Sort	0.00112
Integer	1048576	Random sparse	Ord_Sort	0.09372
Integer	1048576	Random dense	Ord_Sort	0.09464
Integer	1048576	Random 3	Ord_Sort	0.00317
Integer	1048576	Random 10	Ord_Sort	0.00141
Integer	1048576	Random order	Unord_Sort	0.06738
Integer	1048576	Identical	Unord_Sort	0.03666
Integer	1048576	Increasing	Unord_Sort	0.01244
Integer	1048576	Decreasing	Unord_Sort	0.03999
Integer	1048576	Random sparse	Unord_Sort	0.06779
Integer	1048576	Random dense	Unord_Sort	0.06999
Integer	1048576	Random 3	Unord_Sort	0.07607
Integer	1048576	Random 10	Unord_Sort	0.13818
Integer	1048576	Random order	Ord_Sorting	0.11640
Integer	1048576	Identical	Ord_Sorting	0.00322
Integer	1048576	Increasing	Ord_Sorting	0.00309
Integer	1048576	Decreasing	Ord_Sorting	0.00437
Integer	1048576	Random sparse	Ord_Sorting	0.11340
Integer	1048576	Random dense	Ord_Sorting	0.11387
Integer	1048576	Random 3	Ord_Sorting	0.00784
Integer	1048576	Random 10	Ord_Sorting	0.00415

ivan-pi commented 3 years ago

A few more user wishes concerning sorting are in https://github.com/j3-fortran/fortran_proposals/issues/101.

wclodius2 commented 3 years ago

@certik @ivan-pi @rweed @jacobwilliams @jvdp1

You have all expressed interest in having the standard library include a module for sorting data. I now have a reasonable draft of such a module and a testing program, but before it is pulled into the standard library the following issues should be addressed:

Is stdlib_sorting the ideal name for the module;
Should the integer type used for indexing be INT32 or INT64;
Should the integer parameter used for indexing be named INT_SIZE or something else;
What is the best name for the following generic subroutines identified by their tentative names: A. UNORD_SORT, a subroutine optimized for sorting near random data, B. ORD_SORT, a subroutine optimized for sorting data that has been partially sorted, and C. ORD_SORTING, a subroutine designed to generate an array of indices that would sort the input array (this is intended to be used to sort an array of a derived type based on the values of a component of that type);
The current testing program exceeds the default Mac OS X stack limit of 8192 kbytes when running ORD_SORTING on at least one test case with an input array of 2**20 elements, but has no problems with the maximum stack limit of 65532 kbytes. This will cause problems in testing on other computers. Partial fixes for this problem include: A. Setting the stack size larger than the default in running the tests, B. Test using a test case of significantly fewer than 2**20 elements, C. Setting the integer type used for indexing to INT32 from INT64 reducing memory usage in ORD_SORTING by about one third, D. Set the output INDEX array for ORD_SORTING to INTENT(INOUT), rather than an allocatable INTENT(OUT) so that the INDEX array can be allocated outside of the routine in static memory rather than on the stack, reducing memory on the stack by up to a factor of two, or E. Set the input ARRAY for ORD_SORTING to INTENT(INOUT) rather than the current INTENT(IN) which necessitates allocating a temporary array to hold the values of ARRAY during sorting.
ORD_SORT and ORD_SORTING allocate memory and call ERROR STOP if allocation fails instead of returning an error flag.

Several things of minor note:

The codes are about twice as fast when compiled with ifort 18.0.3 than when compiled with gfortran 10.2.0 with the largest relative differences for UNORD_SORT.
When compiled with ifort UNORD_SORT is about 40% faster than ORD_SORT on near random data.
ORD_SORT is several orders of magnitude faster than UNORD_SORT on data with large sections of pre-sorted data.

The current draft specification document follows:

title: Sorting Procedures

The `stdlib_sorting` module

(TOC)

Overview of sorting

The sorting of collections of data is useful in the analysis of the collections. With its absence of generics and limited polymorphism, it is impractical in current Fortran to provide sorting routines for arbitrary collections of arbitrary types of data. However Fortran's arrays are by far its most widely used collection, and arrays of arbitrary types of data can often be sorted in terms of a single component of intrinsic type. The Fortran Standard Library therefore provides a module, stdlib_sorting with procedures to sort arrays of single intrinsic numeric types, i.e. the different kinds of integers and reals.

Overview of the module

The module stdlib_sorting defines several public entities one default integer parameter, int_size, and three overloaded subroutines: ORD_SORT, UNORD_SORT, and ORD_SORTING. The overloaded subroutines also each have seven specific names for versions with differend types of array. arguments.

The `int_size` parameter

The int_size parameter is used to specify the kind of integer used in indexing the various arrays. Currently the module sets int_size to the value of int64 from the intrinsic ISO_FORTRAN_ENV module. For many applications a value of INT32 would be sufficient for addressing and would save some stack space for the subroutines,

The module subroutines

The stdlib_sorting module provides three different overloaded subroutines intended to sort three different kinds of arrays of data:

UNORD_SORT is intended to sort simple arrays of intrinsic data that is effectively unordered before the sort;
ORD_SORT is intended to sort simple arrays of intrinsic data that has significant sections that were partially ordered before the sort; and
ORD_SORTING is intended to provide indices for sorting arrays of derived type data, based on the ordering of an intrinsic component of the derived type.

The `UNORD_SORT` subroutines

UNORD_SORT uses the [introsort sorting algorithm of David Musser] (http://www.cs.rpi.edu/~musser/gp/introsort.ps). introsort is a hybrid unstable comparison algorithm combining quicksort, insertionsort, and heapsort. While this algorithm is always O(N Ln(N)) it is relatively fast on randomly ordered data, but inconsistent in performance on partly sorted data. David Musser has given permission to include a variant of introsort in the Fortran Standard Library under the MIT license provided we cite:

Musser, D.R., “Introspective Sorting and Selection Algorithms,” Software—Practice and Experience, Vol. 27(8), 983–993 (August 1997).

as the official source of the algorithm.

As with introsort, UNORD_SORT is an unstable hybrid algorithm using insertion sort for the outer branches of the sort tree, by default using quicksort for the main body of the sort tree, but if quicksort is converging too slowly the algorithm resorts to heap sort. The algorithm is of order O(N Ln(N)) for all inputs. Because it relies on quicksort, the coefficient of the O(N Ln(N)) behavior is typically small compared to other sorting algorithms. The algorithm shows relatively little performance difference (factors of two to four) between arrays of "random" data and arrays with partially sorted data.

The `ORD_SORT` subroutine

ORD_SORT is a translation of the [rust sort sorting algorithm] (https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs) which in turn is inspired by the [timsort algorithm of Tim Peters] (http://svn.python.org/projects/python/trunk/Objects/listsort.txt). ORD_SORT is a hybrid stable comparison algorithm combining mergesort, and insertion sort. It is always at worst O(N Ln(N)) in sorting random data, having a performance about 15-40% slower than UNORD_SORT on random data, but has much better performance than UNORD_SORT on partially sorted data, having O(N) performance on uniformly increasing or decreasing data. The rust sort implementation is distributed with the header:

Copyright 2012-2015 The Rust Project Developers. See the COPYRIGHT
file at the top-level directory of this distribution and at
http://rust-lang.org/COPYRIGHT.

Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
<LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
option. This file may not be copied, modified, or distributed
except according to those terms.

so the license for the original code is compatible with the use of modified versions of the code in the Fortran Standard Library under the MIT license.

As with timsort, ORD_SORT is a stable hybrid algorithm. It begins by traversing the array starting in its tail attempting to identify runs in the array, where a run is either a uniformly decreasing sequence, ARRAY(i-1) > ARRAY(i), or non-decreasing, ARRAy(i-1) <= ARRAY(i), sequence. Decreasing sequences are reversed then, if the sequence has less than MIN_RUN elements, previous elements in the array are added to the run using insertion sort until the run contains MIN_RUN elements or the array is completely processed. As each run is identified start and length of the run is then pushed onto a stack and the stack is then processed using merge until it obeys the stack invariants:

len(i-2) > len(i-1) + len(i)
len(i-1) > len(i)

ensuring that processing the stack is, at worst, of order O(N Ln(N)). However, because of the identification of decreasing and non-decreasing runs, processing of structured data can be much faster, with processing of uniformly decreasing or non-decreasing arrays being of order O(N). The result in our tests is that ORD_SORT is about 15-40% slower than UNORD_SORT on purely random data, depending on the compiler, but can be more than an order of magnitude faster than UNORD_SORT on highly structured data. ORD_SORT allocates a temporary array with half the elements of the input array, that can cause problems with stack limits for large input arrays.

The `ORD_SORTING` subroutine

The UNORD_SORT and ORD_SORT subroutines can sort isolated arrays of intrinsic types, but do nothing for the sorting of arrays of derived types. For arrays of derived types what is useful is an array of indices that maps the original array to an array sorted based on the value of a component of the derived type. For such a sort a stable sort is useful, therefore the module provides a subroutine, ORD_SORTING, that generates such an array of such indices based on the ORD_SORT algorithm. As ORD_SORTING is also based on the rust sort algorithm the rust sort license must be acknowledged:

Copyright 2012-2015 The Rust Project Developers. See the COPYRIGHT
file at the top-level directory of this distribution and at
http://rust-lang.org/COPYRIGHT.

Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
<LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
option. This file may not be copied, modified, or distributed
except according to those terms.

noting that the Fortran Standard Library is released under the MIT license so that incorporation of the rust sort algorithm is compatible with its license.

The logic of ORD_SORTING parallels that of ORD_SORT, with additional housekeeping to keep the array of indices consistent with the sorted positions of the input array. Because of this additional housekeeping it has slower performance than the other two routines. Because it uses several temporary arrays, it is also more sensitive to stack limits than the other two routines.

Tentative specifications of the `stdlib_sorting` procedures

`unord_sort` - sorts an input array

Status

Experimental

Description

Returns an input array with the elements sorted in order of increasing value.

Syntax

call unord_sort ( array )

Class

Pure generic subroutine.

Arguments

array : shall be a rank one array of any of the types: integer(int8), integer(int16), integer(int32), integer(int64), real(real32), real(real64), or real(real128). It is an intent(inout) argument. On return its input elements will be sorted in order of non-decreasing value.

Notes

UNORD_SORT implements a hybrid sorting algorithm combining quicksort, merge sort, and insertion sort. For most purposes it behaves like a quicksort with a median of three partition, providing good O(N Ln(N)) run time performance for most random arrays, but defaulting to merge sort if the structure of the array results in a prediction of poor runtime performance.

`ord_sort` - sorts an input array

Status

Experimental

Description

Returns an input array with the elements sorted in order of increasing value.

Syntax

call ord_sort ( array )

Class

Generic subroutine.

Arguments

array : shall be a rank one array of any of the types: integer(int8), integer(int16), integer(int32), integer(int64), real(real32), real(real64), or real(real128). It is an intent(inout) argument. On return its input elements will be sorted in order of non-decreasing value.

Notes

ORD_SORT implements a hybrid sorting algorithm combining heapsort, and insertion sort. For most purposes it behaves like a merge sort, providing worst case O(N Ln(N)) run time performance for most random arrays, that is typically slower than UNORD_SORT. However, if the array has significant runs of decreasing or non-decreasing values performance can be much better than UNORD_SORT, with O(N) behavior on uniformly decreasing, or non-decreasing arrays.

`ord_sorting` - creates an arry of sorting indices for an input array.

Status

Experimental

Description

Returns an integer array whose elements would sort the input array in the specified direction retaining order stability.

Syntax

call unord_sort ( array, index[, reverse, zero_based )

Class

Subroutine.

Arguments

array: shall be a rank one array of any of the types: integer(int8), integer(int16), integer(int32), integer(int64), real(real32), real(real64), or real(real128). It is an intent(in) argument. It will be an array whose sorting indices are to be determined.

index: shall be a rank one allocatable integer array of kind int_size. It is an intent(out) argument. On return it shall be of size(array) with values that shall be the indices needed to sort array in the desired direction.

reverse (optional): shall be a scalar of type default logical. It is an intent(in) argument. If present with a value of .true. then the indices in index will sort array in order of non-increasing values in stable order. Otherwise the indices will sort array in order of non-decreasing values in stable order.

zero_based (optional): shall be a scalar of type default logical. It is an intent(in) argument. If present with the value .true. the values of index will be for zero based array indexing, otherwise the values of index will be for one's based array indexing.

Notes

ORD_SORTING implements the hybrid sorting algorithm of ORD_SORT. As a merge sort based algorithm it is a stable sorting comparison algorithm. It uses several temporary arrays that can cause problems with stack limits.

aman-godara commented 3 years ago

I was wondering if a work array can be implemented for this (work array will decrease the number of temporary arrays): https://en.wikipedia.org/wiki/Merge_sort#Top-down_implementation

wclodius2 commented 3 years ago

Work arrays would certainly help stack memory usage of ORD_SORT and ORD_SORTING (which requires more than one work array). If I resort to minimizing memory usage then their APIs would be:

`ord_sort` - sorts an input array

Status

Experimental

Description

Returns an input array with the elements sorted in order of increasing value.

Syntax

call ord_sort ( array, work )

Class

Generic subroutine.

Arguments

array : shall be a rank one array of any of the types: integer(int8), integer(int16), integer(int32), integer(int64), real(real32), real(real64), or real(real128). It is an intent(inout) argument. On input it is the array to be sorted. On return its elements will be sorted in order of non-decreasing value.

work (optional): shall be a rank one array of any of the same type as array, and shall have at least size(array)/2 elements. It is an intent(out) argument. Its contents on return are undefined.

Notes

ORD_SORT implements a hybrid sorting algorithm combining heapsort, and insertion sort. For most purposes it behaves like a merge sort, providing worst case O(N Ln(N)) run time performance for most random arrays, that is typically slower than UNORD_SORT. However, if the array has significant runs of decreasing or non-decreasing values performance can be much better than UNORD_SORT, with O(N) behavior on uniformly decreasing, or non-decreasing arrays. If array is of any type REAL the behavior of the sorting is undefined if any element of array is a NaN. The optional work array replaces "scratch" memory that would otherwise be allocated on the stack.

`ord_sorting` - creates an arry of sorting indices for an input array.

Status

Experimental

Description

Returns an integer array whose elements would sort the input array in the specified direction retaining order stability.

Syntax

call ord_sorting ( array, index[, work, iwork, reverse, zero_based )

Class

Subroutine.

Arguments

array: shall be a rank one array of any of the types: integer(int8), integer(int16), integer(int32), integer(int64), real(real32), real(real64), or real(real128). It is an intent(inout) argument. On input it will be an array whose sorting indices are to be determined. On output it will be the sorted array.

index: shall be a rank one allocatable integer array of kind int_size of the same size as array. It is an intent(inout) argument. On return it shall have values that are the indices needed to sort the original array in the desired direction.

work (optional): shall be a rank one array of the same type as array, and shall have at least size(array)/2 elements. It is an intent(out) argument. Its contents on return are undefined.

iwork (optional): shall be a rank one integer array of kind int_size, and shall have at least size(array)/2 elements. It is an intent(out) argument. Its contents on return are undefined.

reverse (optional): shall be a scalar of type default logical. It is an intent(in) argument. If present with a value of .true. then the indices in index will sort array in order of non-increasing values in stable order. Otherwise the indices will sort array in order of non-decreasing values in stable order.

zero_based (optional): shall be a scalar of type default logical. It is an intent(in) argument. If present with the value .true. the values of index will be for zero based array indexing, otherwise the values of index will be for one's based array indexing.

Notes

ORD_SORTING implements the hybrid sorting algorithm of ORD_SORT. As a merge sort based algorithm it is a stable sorting comparison algorithm. If array is of any type REAL the behavior of the sorting is undefined if any element of array is a NaN. The optional work and iwork arrays replace "scratch" memory that would otherwise be allocated on the stack.

wclodius2 commented 3 years ago

I have implemented optional work arrays and the current testing code now does not run out of stack space when processing 2**20 element arrays with ORD_SORTING with the default Mac OS X maximum stack size of 8192 kbytes. However I discovered that my testing code was allocating space on the stack for expressions such as ARRAY(INDEX(:)), and those expressions had to be assigned to static storage before the stack problems were fully fixed.

jvdp1 commented 3 years ago

@wclodius2 Thank you for this impressive work. Sorting algorithms are highly needed for multiple procedures in stdlib (e.g., #337, sorting arrays of sparse matrices, ...). Therefore I believe it would be a really nice addition.

To answer to (some of) your questions in a previous comment:

Is stdlib_sorting the ideal name for the module;

OK for me

Should the integer type used for indexing be INT32 or INT64;

If there is no size limit on the size of the array to be sorted, then I am in favor of int64 (I got too often seg faults with int32 in such algorithms).

What is the best name for the following generic subroutines identified by their tentative names: A. UNORD_SORT, a subroutine optimized for sorting near random data, B. ORD_SORT, a subroutine optimized for sorting data that has been partially sorted, and C. ORD_SORTING, a subroutine designed to generate an array of indices that would sort the input array (this is intended to be used to sort an array of a derived type based on the values of a component of that type);

These names are fine for me, and are helpful for users who don't care about the sorting algorithm but want the best algorithm for sorting their array. However, for other users, would it be also possible to provide the procedrues with names that refer to their algorithm (e.g., merge_sort, insertion_sort)?

wclodius2 commented 3 years ago

@wclodius2 Thank you for this impressive work. Sorting algorithms are highly needed for multiple procedures in stdlib (e.g., #337, sorting arrays of sparse matrices, ...). Therefore I believe it would be a really nice addition.

snip

Should the integer type used for indexing be INT32 or INT64;

If there is no size limit on the size of the array to be sorted, then I am in favor of int64 (I got too often seg faults with int32 in such algorithms).

There are no obvious limits on the size of the arrays that can be processed. Maybe more thorough testing would find some. For example, UNORD_SORT with its recursion would exceed maximum stack space in processing a sufficiently large array. There might be limits on the static storage required to allow ORD_SORT and ORD_SORTING to process large arrays.

What is the best name for the following generic subroutines identified by their tentative names: A. UNORD_SORT, a subroutine optimized for sorting near random data, B. ORD_SORT, a subroutine optimized for sorting data that has been partially sorted, and C. ORD_SORTING, a subroutine designed to generate an array of indices that would sort the input array (this is intended to be used to sort an array of a derived type based on the values of a component of that type);

These names are fine for me, and are helpful for users who don't care about the sorting algorithm but want the best algorithm for sorting their array. However, for other users, would it be also possible to provide the procedrues with names that refer to their algorithm (e.g., merge_sort, insertion_sort)?

The current procedures are hybrid sorts, and most clearly identified by names that would be meaningless to most users.

ORD_SORT and ORD_SORTING are based on the rust sort of the Rust Language which is a simplified version of the timsort of Tim Peters. The core of the algorithm is an iterative (not recursive) merge sort. The algorithm examines the next "few" elements of the array for a run of uniformly decreasing values, or a run of uniformly non-decreasing values. If the run is uniformly decreasing then the order of the elements are reversed. If the resulting run is too short, it is padded to the desired minimum size and an insertion sort is performed on the run. Then the run is compared with previously identified runs to see if any recent runs should be merged, with an algorithm that ensures at worst O(N Ln(N)) runtime performance.

UNORD_SORT is based on the introsort (short for introspective sort) of David Musser. It combines three sorting algorithms: quicksort, heap sort, and insertion sort. First it examines the array and estimates the depth of recursion a quick sort would require for ideal (random) data, D = Ln(N)/Ln(2). It then defines a limit to the number of quicksort recursions to be allowed in processing, D_limit = factor * D, where factor is typically taken to be 2, and calls introsort proper. 'introsort` proper then:

Examines the number of elements remaining to be sorted, and, if they are less than 16, sorts them using insertion sort and returns;
If they are not less than 16, checks whether the current depth of recursion exceeds D_limit and, if it does, processes the remaining elements with heap sort and returns;
If the current depth of recursion does not exceed D_limit, then in effect do a quicksort step:
- Partition the remaining array using a median of three,
- Call introsort proper on the leftmost partition,
- Call introsort proper on the rightmost partition, and then returns.

jvdp1 commented 3 years ago

thanks for the detailed explanations. Based on these, the proposed names make sense to me.

wclodius2 commented 3 years ago

I have started a pull request on the stdlib_sorting module so far focussed on the markdown document, stdlib_sorting.md. I would like some feedback on the overall API, before including the module proper and the testing code for approval.

gareth-nx commented 3 years ago

Should stdlib also provide an interface to C's qsort function?

The motivation is flexibility -- it allows sorting arbitrary derived types with arbitrary user-defined comparison functions. The downside is the user needs to provide a comparison function, and the call requires using some iso_c_binding features. To me this makes it somewhat "non-fortranic", and I prefer to use alternatives such as those suggested above, if possible. But for "very general" cases, I don't know a better technique than using C's qsort.

If we were to do this, the sorting module would need to have an interface to C's qsort:

    interface

        subroutine qsort_C(array, elem_count, elem_size, compare) bind(C,name="qsort")     
          use iso_c_binding, only: c_ptr, c_size_t, c_funptr
          implicit none
          type(c_ptr), value       :: array ! C-pointer to the first entry of the array
          integer(c_size_t), value :: elem_count ! Number of elements in the array
          integer(c_size_t), value :: elem_size ! Size of each element, according to c_sizeof()
          type(c_funptr), value    :: compare ! c_funptr to the user-provided comparison function
        end subroutine qsort_C 

    end interface

To use it, the user would need to provide a function compare that takes 2 entries of array and returns -1, 0, or 1 depending on their order. For instance if we are sorting integers (edited on 18th April to make arguments of type c_ptr):

    function test_integer_compare(i1ptr, i2ptr) result(sgn) bind(C)
        type(c_ptr), value, intent(in) :: i1ptr, i2ptr
        integer, pointer :: i1, i2 ! In more general cases these would be of type(my_derived_type) 
        integer(c_int) :: sgn

        call c_f_pointer(i1ptr, i1)
        call c_f_pointer(i2ptr, i2)

        ! The user defines what 'less than', 'equal', 'greater than' means by setting 'sgn'
        if(i1 < i2) sgn = -1_c_int
        if(i1 == i2) sgn = 0_c_int
        if(i1 > i2) sgn = 1_c_int
    end function

The call to qsort_C is a bit ugly because one needs to make use of c_loc and c_funloc and c_sizeof.

    subroutine test_qsort_C
        integer, target :: array(5) ! In more general cases this could be of type(my_derived_type)

        ! To be sorted
        array = [4, 3, 5, 1, 2]

        ! In-place sort of the array
        call qsort_C( &
            c_loc(array(1)), & 
            size(array, kind=c_size_t), &
            c_sizeof(array(1)), &
            c_funloc(test_integer_compare) )

        if(all(array == [1, 2, 3, 4, 5])) then
            print*, 'PASS'
        else
            print*, 'FAIL'
        end if

    end subroutine

My sense is that stdlib should provide something with this level of generality. Is there any better approach?

wclodius2 commented 3 years ago

I now have three requested changes in the API. Let me know if you have strong preferences.

Change the name of ORD_SORTING. Possibilities include:
- ORD_SORT_INDEX
- ORD_ARGSORT
- ORD_INDEX
- SORT_INDEX
- Maybe some other name?
Add a REVERSE subroutine
Add a wrapper for the C/C++ library qsort. This would allow direct sorting of arrays of a derived type.

gareth-nx commented 3 years ago

Upon looking at the fortran standard in more detail, I've become uncertain about the correct use of C's qsort. I've raised a question here to clarify.

gareth-nx commented 3 years ago

Based on the fortran discourse discussion I have corrected the qsort_C example above. The key point is that the comparison function should take input arguments of type(c_ptr), value, intent(in), and then do type casting. This will work for an arbitrary user-specified type. It will also work for intrinsic types. Interestingly it won't work for sorting character(len=*) -- for characters we need to specify len.

wclodius2 commented 3 years ago

@gareth-nx , @jvdp1 and I are having a discussion of my results and its possible implications for the naming conventions of the stdlib_sorting module's API on #386 .

wclodius2 commented 3 years ago

@ivan-pi on #386 has expressed concerns that the implemented sortings for character(*) arrays and string_type arrays (using the comparison operators and not the lexical comparison functions) may not match users needs and might be best removed from the API and my code.

Also note that the results of the testings may be of interest to @certik for his Fortran benchmarks project.

wclodius2 commented 3 years ago

I have committed and pushed my source code to #386 . The testing harness found a couple of errors that didn't show up with my compilers on my computer. I have committed and pushed the fixed codes. The testing harness for the fixed codes does not report any further errors.

milancurcic commented 3 years ago

@gareth-nx Do you mind emailing me at milancurcic@hey.com? I can't find your contact info. I apologize for the off-topic message in this PR, but I can't find a better way on GitHub. :)

wclodius2 commented 3 years ago

I have started up a new branch at https://github.com/fortran-lang/stdlib/pull/408 to deal with some merge issues.

fortran-lang / stdlib

Implement sorting algorithms #98

Sorting routines in various libraries

Various user routines

Internet forums and discussion boards

Interfaces to C qsort

Published works

title: Sorting Procedures

The stdlib_sorting module

Overview of sorting

Overview of the module

The int_size parameter

The module subroutines

The UNORD_SORT subroutines

The ORD_SORT subroutine

The ORD_SORTING subroutine

Tentative specifications of the stdlib_sorting procedures

unord_sort - sorts an input array

Status

Description

Syntax

Class

Arguments

Notes

ord_sort - sorts an input array

Status

Description

Syntax

Class

Arguments

Notes

ord_sorting - creates an arry of sorting indices for an input array.

Status

Description

Syntax

Class

Arguments

Notes

ord_sort - sorts an input array

Status

Description

Syntax

Class

Arguments

Notes

ord_sorting - creates an arry of sorting indices for an input array.

Status

Description

Syntax

Class

Arguments

Notes

Interfaces to C `qsort`

The `stdlib_sorting` module

The `int_size` parameter

The `UNORD_SORT` subroutines

The `ORD_SORT` subroutine

The `ORD_SORTING` subroutine

Tentative specifications of the `stdlib_sorting` procedures

`unord_sort` - sorts an input array

`ord_sort` - sorts an input array

`ord_sorting` - creates an arry of sorting indices for an input array.

`ord_sort` - sorts an input array

`ord_sorting` - creates an arry of sorting indices for an input array.