fortran-lang / stdlib

Fortran Standard Library
https://stdlib.fortran-lang.org
MIT License
1.03k stars 163 forks source link

What should be part of stdlib? #1

Open milancurcic opened 4 years ago

milancurcic commented 4 years ago

Existing libraries, for inspiration or adoption


First issue in this repo which evolved from this thread. This is a broad, open ended, high-level issue, so feel free to go wide and crazy here.

To propose a specific module, procedure, or derived type, please open a new issue. You can follow the same format as in Fortran Proposals.

Wishlist from upthread

From @apthorpe:


From @fortranfan:


From @zbeekman:

FortranFan commented 4 years ago

@milancurcic wrote:

..

  • Many others -- what did I miss?

Great start, perhaps @tclune et al at NASA with https://github.com/nasa/gFTL can contribute or be an inspiration?

ivan-pi commented 4 years ago

The D standard library can also serve as reference/inspiration: https://dlang.org/library/. For many of the D modules there are already some existing open-source modules in Fortran (like dealing with CSV and JSON files, datetime objects, low-level string operations).

Inspired by the D library, I prepared a bunch of functions for checking ASCII characters: https://github.com/ivan-pi/fortran-ascii

ivan-pi commented 4 years ago

Several interesting modules are available in the General-Purpose Fortran package (command line arguments, strings, expression parsers,messages, io, hot keys, fortran/C calls, graphics, sorting, unit conversions).

George Benthien has also made some string utilities and expression parsers.

Also Alan Miller's Fortran software contain many routines that are suitable for a stdlib.

The Rosetta Code Fortran pages contain simple implementations of several algorithms (greatest common divisors, sorting, searching, etc.) and data types (priority queues, decks, linked lists, etc.).

marshallward commented 4 years ago

I would like to see greater support for bit-reproducible numerical operations. This is a very high priority for us since our models are used in weather and climate forecasting, and much of our time is devoted to bit reproducibility of our numerical calculations.

A common problem is intrinsic Fortran reduction operations like sum(), where the order is ambiguous (deliberately, one might say), and therefore not reproducible. A more serious problem for us is transcendental functions, like exp() or cos(), which will give different results for different optimizations, and we typically cannot say where it was invoked (libm? Vendor library? etc.).

A standard library may be a place to provide bit-reproducible implementations.

rweed commented 4 years ago

Fortunately, there is a wealth of libraries etc we can draw from. Some of the older ones like SLATEC etc are still in F77 but can be converted to F90 free format for consistency. One issue that needs to be resolved though are possible license conflicts. Here are a few more suggestions (there are probably a hundred more if we do a deep dive into whats available)

For general mathematical functions etc.

SLATEC/nistCML https://www/nist/gov/itl/math/software https://www.netlib.org/slatec https://people.sc.fsu.edu/~jburkardt/f_src/slatec/slatec.html (F90 translation)

John Burkardt's collection of software at https://people.sc.fsu.edu/~jburkardt/f_src/f_src.html

For containers/ADTs, I would suggest Robert Ruegers Fortran Template Library at https://github.com/SCM-NV/ftl Similar to @tclune gFTL but Ruegers implementations of the various containers and how he does the preprocessing step was easier for me to follow

Two books that have available code that I would suggest are Robin Vowels, "Algorithms and Data Structures in F and Fortran" and Dick Hanson and Tim Hopkins, "Numerical Computing in Modern Fortran". I've implemented some of the sorting routines from both books. In particular, I have a "semi"-generic implementation of Hanson and Hopkins quickSort routines that support all the integer types, 32 and 64 bit reals, character strings, and a user type/class. I have my own implementations of several commonly used ADTs based on unlimited polymorphic variables that I can contribute for reveiw but I need to go back and look at licensing issues since I borrowed ideas from Arjen Markus FLIBS, and Rueger's FTL. Also, I would personally avoid anything related to Numerical Recipes like the plague due to their restrictive license (and poor implementations of some of the algorithms)

certik commented 4 years ago

@marshallward I created #12 for bit-reproducibility, let's discuss the details there.

jacobwilliams commented 4 years ago

One big question is, do we want this library to contain numerical/scientific type codes? For example, ODE solvers, optimizers, interpolation, etc... The sorts of things that were in SLATEC and are in SciPy. A library like that is desperately needed for modern Fortran. Is that this library, or does that belong in another library built upon this one?

certik commented 4 years ago

@jacobwilliams excellent question. I don't know the answer, we need to discuss it. I am a bit worried if the scope does not become too much if we include everything that potentially can be in SciPy.

milancurcic commented 4 years ago

I am not opposed to numerical and scientific codes being part of stdlib. The scope of Fortran's stdlib doesn't necessarily need to be similar to that of Python, C, or any other language. Fortran is more ubiquitous in science and engineering, and to me it makes sense that the stdlib would have modules similar to numpy and scipy.

milancurcic commented 4 years ago

One personal challenge I have with stock Fortran are its somewhat awkward and low-level I/O facilities -- open, read, write, inquire, rewind, and close. I often wished for a higher-level interface, like what you get with Python's open() -- you open a file with a function, get a file-like instance with methods that let you do stuff with it.

This would do away with unit numbers, which I don't think application developers should have to deal with. It could also be a solution to the problem that allocatable character strings must be pre-allocated before use on read statement.

Is there anything similar out there for Fortran? Would this be of interest to people here? I'd use it.

certik commented 4 years ago

If we want to go with this broader scope, then one reasonable proposal can be to limit the scope roughly to what is in here:

https://www.mathworks.com/help/matlab/mathematics.html

Which seems to cover roughly what is in NumPy and SciPy.

If we use the Python analogy, the bare bone Python language does not have much for numerical computing. And if you do any kind of numerical computing in Python (I do), NumPy and SciPy are pretty much the "standard library". Not surprisingly, the default "Matlab standard library" roughly covers the same range.

The Julia standard library (https://github.com/JuliaLang/julia/tree/5da74be66993fb19edce52e4877d8ae2edbe27b0/stdlib, documented at https://docs.julialang.org, in the left column scroll down to "Standard Library") does not cover as wide range, but still includes linear algebra (Lapack), sparse arrays, statistics. It used to contain fft, but they moved it out apparently (https://discourse.julialang.org/t/where-is-the-fft/16512) -- it would be interesting to know the reasoning, as Matlab as well as NumPy has fft by default.

Ok, it's not a bad idea.

jvdp1 commented 4 years ago

Is there anything similar out there for Fortran? Would this be of interest to people here? I'd use it.

I would use it too.

Something that might be interesting to include in a standard library is sparse arrays (creation, management).

certik commented 4 years ago

I think we can learn from Julia a lot. Here is the discussion related to moving FFT out of Julia's standard library and into a separate package:

https://github.com/JuliaLang/julia/issues/18389

and apparently they want to also move much of the linear algebra out. See also:

https://groups.google.com/forum/#!topic/julia-users/ug5Jh6y5biA.

https://github.com/JuliaLang/julia/issues/5155

If I understand their arguments, if it's part of the julia compiler itself, it's hard for them to make a release, test things properly on Travis, etc. Applied to Fortran, that would be like moving things from Fortran compilers (gfortran, ifort, ...) into a separate library like this stdlib.

certik commented 4 years ago

So here are other things that could be part of stdlib:

cmacmackin commented 4 years ago

One personal challenge I have with stock Fortran are its somewhat awkward and low-level I/O facilities -- open, read, write, inquire, rewind, and close. I often wished for a higher-level interface, like what you get with Python's open() -- you open a file with a function, get a file-like instance with methods that let you do stuff with it.

This would do away with unit numbers, which I don't think application developers should have to deal with. It could also be a solution to the problem that allocatable character strings must be pre-allocated before use on read statement.

Is there anything similar out there for Fortran? Would this be of interest to people here? I'd use it.

I'd personally like something along these lines. However, the problem is in defining methods on the file-object; these would need to know the number and type of arguments at compile-time. It would be impractical to produce methods with every conceivable permutation of object types. It would also require variadic functions, which are not available. As such, this can not be implemented well in Fortran, although perhaps something would be possible if we were to wrap some C-routines and pass in deferred-type objects.

cmacmackin commented 4 years ago

So here are other things that could be part of stdlib:

* sparse matrices

* fft

* special functions (like in SciPy) such as spherical harmonics, hypergeometric functions, ...

* random numbers

* statistics

* ODE solvers and numerical integration (Gauss-Legendre points and weights and other algorithms)

* optimization (root finding, etc.)

Some sort of interface for working with solvers for dense matrices would also be useful. LAPACK is horribly tedious to use, so an object-oriented wrapper could be handy. This could hold the factored version of the matrix, handle allocation of work arrays, etc. I've written code along these lines in the past.

certik commented 4 years ago

Some sort of interface for working with solvers for dense matrices would also be useful. LAPACK is horribly tedious to use, so an object-oriented wrapper could be handy. This could hold the factored version of the matrix, handle allocation of work arrays, etc. I've written code along these lines in the past.

Yes, that's already planned, see #10.

certik commented 4 years ago

@milancurcic why don't you start a separate issue for the IO stuff, so that we can discuss it there.

zbeekman commented 4 years ago

One personal challenge I have with stock Fortran are its somewhat awkward and low-level I/O facilities -- open, read, write, inquire, rewind, and close. I often wished for a higher-level interface, like what you get with Python's open() -- you open a file with a function, get a file-like instance with methods that let you do stuff with it.

This would do away with unit numbers, which I don't think application developers should have to deal with. It could also be a solution to the problem that allocatable character strings must be pre-allocated before use on read statement.

Is there anything similar out there for Fortran? Would this be of interest to people here? I'd use it.

This is one of my primary motivations too. As @cmacmackin pointed out, we may not be able to get a one-to-one mapping of our favorite implementation X for fileIO stuff, but we can certainly make something better than what we have and idiomatically Fortran-like. And were there is very obvious solutions that need to be implemented in the language standard we can lobby for those.

milancurcic commented 4 years ago

@zbeekman can you post this message to #14?

certik commented 4 years ago

Do we all agree that the scope is broader (e.g., Python standard libraries + NumPy/SciPy), rather than narrower (e.g., C++ standard library)?

If so, let's write down in general terms, what the scope is and put it into README. I started at #43. Can you help me polish it up?

zbeekman commented 4 years ago

PR looks good to me. I think this is an area that will evolve over time. As such I don't think we need to hash out every detail so long as we ensure things only grow in a good way organically... balance immediate needs with the threat of incurring technical debt and bad design choices.

If we hash things out in too much detail documents won't reflect reality. The PR is looking good last I checked and I'm generally happy with the vast majority of ideas and desires that others have expressed so far.

zbeekman commented 4 years ago

A more useful step might be to provide more clarity on governance and workflow since right now the process of deciding when PRs are merged is murky, much less how to decide and agree upon what the grand objectives of the project are.

ivan-pi commented 4 years ago

I've noticed the Julia base library wraps the C standard library routines for memory allocation and also some from : https://docs.julialang.org/en/v1/base/libc/

Could this also be a target for the Fortran stdlib? Maybe this could be a possible way to provide support for Unicode characters, time and date functions, etc.?

certik commented 4 years ago

@ivan-pi good point.

One question to consider is whether we want at least the core of stdlib to be pure Fortran. I can see a lot of advantages of that (no dependencies on other languages and runtimes, just Fortran is enough). Obviously the disadvantage is that it's nice to just call other libraries instead of reimplement things in Fortran.

Maybe these "wrappers" can be an optional component of stdlib?

ivan-pi commented 4 years ago

I support the idea of having a pure Fortran core with zero external dependencies, as long as it is possible. I can't imagine that many people really need Unicode support in their Fortran codes (it is needed in some, for example in json-fortran).

For the calendar and datetime support (see https://github.com/fortran-lang/stdlib/issues/106) one suggested solution was to use the C standard library functionality from <time.h>. The datetime-fortran package by @milancurcic also relies on the C routines strptime and strftime to format and parse datetime strings. While I do not doubt a pure Fortran implementation is possible, it could be tedious.

I've gone ahead and prepared interfaces for a few elements of the C standard library in https://github.com/ivan-pi/fortran-libc (malloc, calloc, realloc, free, qsort, and the entire ) I think it would be nice to have this as an optional component to stdlib. That way not everyone has to go through the trouble of writing these interfaces next time they need to allocate some C memory from Fortran. (Alternatively an automatic interface generator like swig-fortran could be used.)

milancurcic commented 4 years ago

@ivan-pi In my opinion, yes.

I think every Fortran compiler comes with a companion C compiler, no? Which ships with its libc. So is there really any burden to having C-interfaces in stdlib? If there is, then the interfaces should be optional.

Of course, if we want to support some of libc, we don't have to do so for all of it. Only parts that we decide are needed, as a community.

jvdp1 commented 4 years ago

Is not libc associated with Fortran executables? I just compiled a simple Fortran program and printed the shared object dependencies:

$ more test.f90 
program tmp
 implicit none
 integer::i

 i=1
 print*,i
end program
$ 
$ gfortran -O0 test.f90 
$ ldd a.out 
    linux-vdso.so.1 (0x00007ffc2a987000)
    libgfortran.so.5 => /lib64/libgfortran.so.5 (0x00007f7131db2000)
    libm.so.6 => /lib64/libm.so.6 (0x00007f7131c6c000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f7131c52000)
    libquadmath.so.0 => /lib64/libquadmath.so.0 (0x00007f7131c08000)
    libc.so.6 => /lib64/libc.so.6 (0x00007f7131a3d000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f713208b000)
$ ifort -O0 test.f90 
$ ldd a.out 
    linux-vdso.so.1 (0x00007ffd12d20000)
    libm.so.6 => /lib64/libm.so.6 (0x00007fbe10b2d000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fbe10b0b000)
    libc.so.6 => /lib64/libc.so.6 (0x00007fbe10942000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fbe10928000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00007fbe1091f000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fbe10caa000)
milancurcic commented 4 years ago

@jvdp1 GFortran does this because it uses libc to implement parts of Fortran, but I don't know if that's true for most Fortran compilers. If it is, I think that would be a good argument for not disallowing Fortran interfaces to libc in stdlib.

certik commented 4 years ago

I think we can have C wrappers for any C library, such as in #45. Including libc.

That being said, I think Fortran language should stand on its own if needed, including compiling without libc if needed.

Having "standard" C interfaces for most common tasks would be very helpful so that people don't have to reimplement those over and over. Perhaps later those could go into its own package, perhaps even called libc, as part of fpm.

dev-zero commented 4 years ago

While working on a large code base (CP2K), a repeating and annoying topic is strings in various forms:

jvdp1 commented 4 years ago

While working on a large code base (CP2K), a repeating and annoying topic is strings in various forms:

  • string lists: still needs a string type AFAIK because you can't put a variable length string inside a variable length list and is therefore very cumbersome without a wrapper for the string, the list or both
  • reading into strings: reading into an allocatable string is not supported, hence we need to buffer manually
  • reading large files as strings: mmap would be nice to avoid copying data unnecessarily, but this results again in an array of characters instead

Thank you. Re: strings I would point to the discussions in #31, #32, and #69. I think that the not yet covered topics you mention could be discussed there too (or in another issue if too specific?).

certik commented 4 years ago

@dev-zero thanks for getting in touch. Besides what @jvdp1 posted, see also https://github.com/j3-fortran/fortran_proposals/issues/24, https://github.com/j3-fortran/fortran_proposals/issues/96 and https://github.com/j3-fortran/fortran_proposals/issues/9.

Regarding Unicode, I think we should support Unicode in stdlib, we should use utf8 and I also posted at https://github.com/fortran-lang/stdlib/issues/11#issuecomment-619108015 with links to utf8 handling code that is simple enough to port to Fortran / stdlib.

If you want to help us implement any of these things, we would really appreciate it!

dev-zero commented 4 years ago

If you want to help us implement any of these things, we would really appreciate it!

Will try, but I'm not sure whether I can spare the time (yet).

Another thing which came to mind should be part of an stdlib are: compatibility functions for compilers not fully implementing standards.

Two examples we encountered in CP2K or DBCSR:

While missing intrinsics could be provided in a compat module, is redefining an intrinsic probably not doable transparently without using the CPP.

ivan-pi commented 4 years ago

I found another FORTRAN 90 Numerical Library (https://sourceforge.net/projects/afnl/) developed by Alberto Ramos. I will edit the first post to include it.

The contents are the following:

David-Duffy commented 3 years ago

I think a good implementation of a data frame. That is, a rectangular array where each column is one homogenous type - integer, character etc, but each column can be of any type. These exist in R, Pandas, Julia etc, and are the workhorse for statistical analysis. One will encounter arguments about whether this should all be in a "real database", and so you just need to provide appropriate Fortran interfaces, but the continued success of R, Pandas etc is a potent counter. For speed, there do have to be indices and hashes under the bonnet, and optimized sorts, joins, Fortran array type slices, and so on.