Open MarDiehl opened 4 years ago
This is related to #14. I think we agreed that file system operations are in the scope. The naming convention should probably be is_directory
or is_dir
, and we should also look at how Python, Julia and Matlab name such functions. We are always trying to be consistent with other languages where it makes sense.
The main goal of stdlib is to figure out and document (in a spec) the API. The underlying implementation is secondary --- stdlib will provide a reference implementation and then compiler vendors are free to provide their own, different or more optimized implementation if they want. The only requirement is that it must run on all platforms, but probably calling into C if we have to would do it.
I have thought a little bit about a possible contribution from my side, and I propose to add general path related functionality. Since the other language I am frequently using is python, I got my inspiration from
In python, the object oriented approach of pathlib is often favored over os.path. However, unless I overlooked something, I believe that the os.path approach is better suited for Fortran because one can not chain function calls.
p = path.absolute(),is_file()
p = path_is_file(path_absolute(Str))
With Fortran variable length strings (allocatable), most operations are easily performed. However, there is one exception:
os.path.commonpath(paths)
takes a list of strings, in Fortran this would be an array of strings/characters and all of them would need to have the same length. Alternatively, one can use an interface for a series of fpp
generated functions with signatures of type path_commonpath(path1,path2)
, path_commonpath(path1,path2,path3)
, ...
While most functionality will be implemented in pure Fortran, certain operations require file system functions from C.
There is one more thing I need to mention: If Windows support is needed, someone else has to provide it. I've never compiled a Fortran program on windows and the path operations differ significantly. pathlib
has essentially two implementations and I assume we are in the same situation. If someone volunteers, I would prefer that we work in parallel on both implementation. My time budget is about 4h/week and I would hope to finish the whole implementation in 3 months.
With Fortran variable length strings (allocatable), most operations are easily performed. However, there is one exception:
os.path.commonpath(paths)
takes a list of strings, in Fortran this would be an array of strings/characters and all of them would need to have the same length. Alternatively, one can use an interface for a series offpp
generated functions with signatures of typepath_commonpath(path1,path2)
,path_commonpath(path1,path2,path3)
, ...
Or it can be solved by an implementation of the iso_varying_string
module or advanced libraries of string handling routines as summarised in this other issue.
However, there is one exception:
os.path.commonpath(paths)
takes a list of strings, in Fortran this would be an array of strings/characters and all of them would need to have the same length. Alternatively, one can use an interface for a series offpp
generated functions with signatures of typepath_commonpath(path1,path2)
,path_commonpath(path1,path2,path3)
, ...
fypp
could be used for that too.
However, implementing and using iso_varying_string
module would be the best option IMO.
FYI:
I have some related modules (M_path, M_io, and M_system) in the GPF (General Purpose Fortran) site on github if you are looking for ideas on how the API might look from a Fortran perspective.
Instead of the entire GPF, self-contained subsets of two of the modules are available:
https://github.com/urbanjost/M_io https://github.com/urbanjost/M_system
The M_path module description (actually all the GPF routines) can be found in the manpage index:
I do not have a stand-alone M_path.f90. It is currently only in the GPF collection, as it uses a number of string routines from M_strings.f90.
@urbanjost Nice code, thanks for the hint.
Thanks. No problem. Some is better than others. Seeded it with a lot of code I had around with the hopes of getting a development community started to expand it and clean it up but it did not catch on as I had hoped. stdlib(3f) seems to have far more momentum behind it. If you find anything useful in it feel free to use it for stdlib(3f).
The Oracle Fortran Library seems to have covered some file system operations: https://docs.oracle.com/cd/E19957-01/805-4942/index.html
Based upon the function names, it looks like it is actually implemented in C. Perhaps it can serve as reference.
Edit: The Absoft Compiler also has similar compatibility libraries - https://www.absoft.com/wp-content/uploads/2015/08/Support-Libraries.pdf
Edit2: Compaq Fortran also had it's own library of C-like functions - http://h30266.www3.hpe.com/odl/unix/progtool/cf95au56/dfumroutines.htm#overview_lib_rout
Perhaps the Fortyxima project (https://bitbucket.org/aradi/fortyxima/src/develop/fortyxima/filesys/) from @aradi could serve as a starting point?
If there is interest, I am happy to clean it up a bit, so that it meets the stdlib coding standards. :wink:
@urbanjost don't feel bad, a lot of us tried to do the same. Thanks for the pointer, I added it to https://github.com/fortran-lang/stdlib/issues/1, thanks for sharing the link. As you can see there, we list 10 such libraries that people did (mine is there too), and we all tried to get a community started around it. It's extremely hard. But I think we finally succeeded this time with stdlib and with fortran-lang.org. I should write a blog post about this --- Fortran is far from being saved, but just the fact that we managed to get the community together is the first necessary step, and it looked impossible to me just a year ago. And yet I think we succeeded at this first step at this point.
@ivan-pi thanks for the pointers, @aradi I think there will be interested, let's discuss the API.
Sure, I opened a separate issue for this (#220)
I have a first prototype of a library that has functions from pythons os
and os.path
: https://github.com/MarDiehl/stdlib_os
Pure Fortran where possible, but most file system related operations rely on C routines.
Works on linux with GNU and Intel compilers.
Exceptions in python translate into error stop
. Currently without a message, but that can be changed.
I have a first prototype of a library that has functions from pythons
os
andos.path
: https://github.com/MarDiehl/stdlib_os Pure Fortran where possible, but most file system related operations rely on C routines. Works on linux with GNU and Intel compilers. Exceptions in python translate intoerror stop
. Currently without a message, but that can be changed.
@MarDiehl I looked to the code. There are already quite a bunch of procedures for Linux OS. Nicely done. Would it be an idea to submit a PR to discuss further the API?
Thanks @MarDiehl for this!
Thank you, I reviewed the repo and recommend moving forward with the PR. We just need to decide how to not build it on Windows.
The goal is to have the file system operations working on Windows also eventually, correct?
Thank you, I reviewed the repo and recommend moving forward with the PR. We just need to decide how to not build it on Windows.
Or we should try to find a volunteer fluent with Windows ;) If the API is already discussed and decided, this should facilite the development right?
Thanks @MarDiehl for the prototype. Is it possible to identify the common functionality with respect to the API proposal by @aradi in #220? @arjenmarkus left a comment there on how to deal with Windows.
@certik Sorry I had assumed that Martin's implementation depended on POSIX, but on second look I don't see it in the code. If this relies only on C stdlib, I think it should work fine on Windows. But perhaps some OS-specific extensions are used. @MarDiehl did you try it on Windows?
Good point, @ivan-pi. I suggest @MarDiehl and @aradi join forces and present a coherent API. I like Martin's as is, it's clear and familiar to me from Python's API.
If it relies on Linux specific things, then we can extend it using ifdefs to also use Windows API to work on Windows.
Please find the answers to the questions below:
The goal is to have the file system operations working on Windows also eventually, correct?
Yes, it would be nice if someone contributes this. The actual implementation should not be too difficult, but it needs to be done by a windows native.
Thank you, I reviewed the repo and recommend moving forward with the PR. We just need to decide how to not build it on Windows.
On python, the actual implementation of is called posixpath
on posix and ntpath
on Windows and will be then mapped to os.path
. I would recommend to do the same thing here, but the implementation depends on whether a standard C preprocessor is a prerequisite for compilation stdlib or whether fypp
should be used exclusively. Also, the integration into fpm
is still a very open topic to me.
Sorry I had assumed that Martin's implementation depended on POSIX, but on second look I don't see it in the code. If this relies only on C stdlib, I think it should work fine on Windows. But perhaps some OS-specific extensions are used. @MarDiehl did you try it on Windows?
I have not tried on Windows but the implementation it is certainly POSIX specific (/
for path separation, ~
for the home directory). Some C headers (e.g. unistd.h
) are also not available on Windows. But I have not tried to build it on windows.
If it relies on Linux specific things, then we can extend it using ifdefs to also use Windows API to work on Windows.
I think there is less common code between POSIX and C than it seems on first glance. Most of the C code is probably OS specific and a all the path routines subtly depend on details of path names. Therefore, I opt to have two fully independent implementations.
Thanks for all the feedback. Actually, before opening a PR I probably add a few more functions and certainly do some testing. I also have three direct questions:
Is the following naming convention ok:
use stdlib_os
use stdlib_os_path
call chdir('/home') ! function from stdlib_os, no stdlib_os prefix
print*, islink('/home') ! function from stdlib_os_path, no stdlib_os_path prefix
I am using allocatable strings from the Fortran standard.
As discussed in the last monthly call, I don't see a reason to use specific string implementations (e.g. iso_varying_string
). But the decision whether stdlib should have a special string (or even path) type goes beyond the scope of implementing os
and os_path
. I hope my implementation shows that allocatable strings are all we need. The consequence of this decision is that we can't have object oriented string libraries.
The only inconvenience of the current implementation is the behavior of split
/splitext
/splitdrive
: They return an size 2 array with length of its strings beeing the maximum length of head
and tail
. I would attribute this flaw to the lack of a tuple or list type, not to a limitation of allocatable strings.
Could a MacOS user test the code?
Hi everyone,
I volunteer to contribute the Windows side of things - while I usually avoid Windows-specific stuff, I do use that platform all the time, as well as the ingressions of Linux (or Unix?) on that platform - Cygwin and MinGW. So I should be able to test that it all works on these platforms. I have no access to MacOS unfortunately, so that will have to be someone else.
Regards,
Arjen
Op do 30 jul. 2020 om 06:47 schreef Martin Diehl notifications@github.com:
Thanks for all the feedback. Actually, before opening a PR I probably add a few more functions and certainly do some testing. I also have three direct questions:
1.
Is the following naming convention ok:
use stdlib_os use stdlib_os_path
call chdir('/home') ! function from stdlib_os, no stdlib_os prefix print*, islink('/home') ! function from stdlib_os_path, no stdlib_os_path prefix
2.
I am using allocatable strings from the Fortran standard. As discussed in the last monthly call, I don't see a reason to use specific string implementations (e.g. iso_varying_string). But the decision whether stdlib should have a special string (or even path) type goes beyond the scope of implementing os and os_path. I hope my implementation shows that allocatable strings are all we need. The consequence of this decision is that we can't have object oriented string libraries. The only inconvenience of the current implementation is the behavior of split/splitext/splitdrive: They return an size 2 array with length of its strings beeing the maximum length of head and tail. I would attribute this flaw to the lack of a tuple or list type, not to a limitation of allocatable strings. 3.
Could a MacOS user test the code?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/201#issuecomment-666104647, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YR2UPKQFZN5WQ6ZRHSLR6D3P3ANCNFSM4NOSR3IA .
Hi everyone, I volunteer to contribute the Windows side of things - while I usually avoid Windows-specific stuff, I do use that platform all the time, as well as the ingressions of Linux (or Unix?) on that platform - Cygwin and MinGW. So I should be able to test that it all works on these platforms. I have no access to MacOS unfortunately, so that will have to be someone else. Regards, Arjen
Great! I think in such system-dependent operations, windows specific things are unavoidable and I would not consider them to be a bad thing as long as they are compiler independent. Probably the actual modifications to the code (changing separator, using difference C headers and functions) are not very difficult. Getting a system-dependent build configuration to work is probably the more time consuming task.
We also need to agree on the handling of line endings in the repository. I assume currently it is UNIX style. Would it make sense to enforce UNIX style line endings via git configuration in general? And do you want an exception for windows-only code then?
I gave you access to the repository, please create a new branch for windows changes or simply fork the whole repository.
As pointed out by @aradi, the python naming conventions are not very 'fortranig'. I therefore suggest the following names:
python Fortran
chdir change_directory
getcwd get_current_working_directory
mkdir make_directory
rename
rmdir remove_directory
symlink create_symlink?
unlink remove_File
python Fortran
abspath abs_path
basename base_name
commonpath common_path
commonprefix common_prefix
dirname dir_name
exists
expanduser expand_user
expandvars expand_vars
getatime get_atime
getctime get_ctime
getmtime get_mtime
getsize get_size
isabs is_abs
isdir is_dir
isfile is_file
islink is_link
ismount is_mount
join
normcase norm_case
normpath norm_path
samefile same_file
relpath rel_path/relative_path?
split
splitdrive split_drive
splitext split_ext
@MarDiehl Your suggestions all look good to me. Suggestion for consistency:
create_symlink
-> create_symbolic_link
abs_path
-> absolute_path
dir_name
-> directory_name
expand_vars
-> expand_variables
is_dir
-> is_directory
I think the two syllables words should be just joined, I think that is actually very fortranic. So abspath
over abs_path
and basename
over base_name
. Not everybody agrees with this recommendation, but I know a lot of Fortran programmers do agree, so that is what we recommend here:
https://www.fortran90.org/src/best-practices.html#naming-convention
The underscores should be used if you want to join several syllables or words such as in get_command_argument
. But if you can make it just two syllables like getarg
, then that works too and I think it looks much better without underscores.
So both of these work: dirname
, directory_name
. But dir_name
I think is suboptimal.
I have most of it working now, with some quirks and limitations (symbolic links on Windows are different than on Linux - in a similar way I guess that pointers in Fortran and C are different ;) - so that is not supported in this version), but one severe problem: a coredump on renaming files.
Well, let me summarise:
Cygwin: no problem, it all worked out of the box, as they say. (gfortran compiler: 9.3.0)
MinGW: the gfortran compiler I have available is too old: 7.3.0, so it does not have -std=f2018 and to compile the Fortran code I should probably eliminate that flag. (I tried with -std=f2008)
On plain/bare Windows: some changes to the C code, though most of the functions work in quite the same way, with just a twist (_getcwd() instead of getcwd() for instance and some macros and include files). Curiously enough the CMake build says to build the test program, but it does not. I have to do that manually.
All tests seem fine, except renaming - that leads to a crash. Not sure yet where this happens or which compiler is responsible.
Op do 30 jul. 2020 om 23:20 schreef Martin Diehl notifications@github.com:
Hi everyone, I volunteer to contribute the Windows side of things - while I usually avoid Windows-specific stuff, I do use that platform all the time, as well as the ingressions of Linux (or Unix?) on that platform - Cygwin and MinGW. So I should be able to test that it all works on these platforms. I have no access to MacOS unfortunately, so that will have to be someone else. Regards, Arjen
Great! I think in such system-dependent operations, windows specific things are unavoidable and I would not consider them to be a bad thing as long as they are compiler independent. Probably the actual modifications to the code (changing separator, using difference C headers and functions) are not very difficult. Getting a system-dependent build configuration to work is probably the more time consuming task.
We also need to agree on the handling of line endings in the repository. I assume currently it is UNIX style. Would it make sense to enforce UNIX style line endings via git configuration in general? And do you want an exception for windows-only code then?
I gave you access to the repository, please create a new branch for windows changes or simply fork the whole repository.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/201#issuecomment-666708458, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YR6VXVUCC7I7OOBFU7TR6HPZJANCNFSM4NOSR3IA .
The problem lies with the f_c_string function. Very odd. I have reduced the program to little more than a call to that function and I get a crash.
Op vr 31 jul. 2020 om 08:20 schreef Arjen Markus <arjen.markus895@gmail.com
:
I have most of it working now, with some quirks and limitations (symbolic links on Windows are different than on Linux - in a similar way I guess that pointers in Fortran and C are different ;) - so that is not supported in this version), but one severe problem: a coredump on renaming files.
Well, let me summarise:
Cygwin: no problem, it all worked out of the box, as they say. (gfortran compiler: 9.3.0)
MinGW: the gfortran compiler I have available is too old: 7.3.0, so it does not have -std=f2018 and to compile the Fortran code I should probably eliminate that flag. (I tried with -std=f2008)
On plain/bare Windows: some changes to the C code, though most of the functions work in quite the same way, with just a twist (_getcwd() instead of getcwd() for instance and some macros and include files). Curiously enough the CMake build says to build the test program, but it does not. I have to do that manually.
All tests seem fine, except renaming - that leads to a crash. Not sure yet where this happens or which compiler is responsible.
Op do 30 jul. 2020 om 23:20 schreef Martin Diehl <notifications@github.com
:
Hi everyone, I volunteer to contribute the Windows side of things - while I usually avoid Windows-specific stuff, I do use that platform all the time, as well as the ingressions of Linux (or Unix?) on that platform - Cygwin and MinGW. So I should be able to test that it all works on these platforms. I have no access to MacOS unfortunately, so that will have to be someone else. Regards, Arjen
Great! I think in such system-dependent operations, windows specific things are unavoidable and I would not consider them to be a bad thing as long as they are compiler independent. Probably the actual modifications to the code (changing separator, using difference C headers and functions) are not very difficult. Getting a system-dependent build configuration to work is probably the more time consuming task.
We also need to agree on the handling of line endings in the repository. I assume currently it is UNIX style. Would it make sense to enforce UNIX style line endings via git configuration in general? And do you want an exception for windows-only code then?
I gave you access to the repository, please create a new branch for windows changes or simply fork the whole repository.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/201#issuecomment-666708458, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YR6VXVUCC7I7OOBFU7TR6HPZJANCNFSM4NOSR3IA .
Of course it was NOT in f_c_string. It was not even in the rename routine or its brethren. It was in ismount_c() which is called slightly later in the program. The line parent = (char ) malloc(strlen(path)+3); should be: parent = (char ) malloc(strlen(path)+4);
With that found, the crash makes sense - strlen() does not count the required NUL character. So the construction of the parent will cause an ever so slightly memory overflow. Next step: consolidate all my changes and only the necessary ones ;).
Op vr 31 jul. 2020 om 08:32 schreef Arjen Markus <arjen.markus895@gmail.com
:
The problem lies with the f_c_string function. Very odd. I have reduced the program to little more than a call to that function and I get a crash.
Op vr 31 jul. 2020 om 08:20 schreef Arjen Markus < arjen.markus895@gmail.com>:
I have most of it working now, with some quirks and limitations (symbolic links on Windows are different than on Linux - in a similar way I guess that pointers in Fortran and C are different ;) - so that is not supported in this version), but one severe problem: a coredump on renaming files.
Well, let me summarise:
Cygwin: no problem, it all worked out of the box, as they say. (gfortran compiler: 9.3.0)
MinGW: the gfortran compiler I have available is too old: 7.3.0, so it does not have -std=f2018 and to compile the Fortran code I should probably eliminate that flag. (I tried with -std=f2008)
On plain/bare Windows: some changes to the C code, though most of the functions work in quite the same way, with just a twist (_getcwd() instead of getcwd() for instance and some macros and include files). Curiously enough the CMake build says to build the test program, but it does not. I have to do that manually.
All tests seem fine, except renaming - that leads to a crash. Not sure yet where this happens or which compiler is responsible.
Op do 30 jul. 2020 om 23:20 schreef Martin Diehl < notifications@github.com>:
Hi everyone, I volunteer to contribute the Windows side of things - while I usually avoid Windows-specific stuff, I do use that platform all the time, as well as the ingressions of Linux (or Unix?) on that platform - Cygwin and MinGW. So I should be able to test that it all works on these platforms. I have no access to MacOS unfortunately, so that will have to be someone else. Regards, Arjen
Great! I think in such system-dependent operations, windows specific things are unavoidable and I would not consider them to be a bad thing as long as they are compiler independent. Probably the actual modifications to the code (changing separator, using difference C headers and functions) are not very difficult. Getting a system-dependent build configuration to work is probably the more time consuming task.
We also need to agree on the handling of line endings in the repository. I assume currently it is UNIX style. Would it make sense to enforce UNIX style line endings via git configuration in general? And do you want an exception for windows-only code then?
I gave you access to the repository, please create a new branch for windows changes or simply fork the whole repository.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/201#issuecomment-666708458, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YR6VXVUCC7I7OOBFU7TR6HPZJANCNFSM4NOSR3IA .
Of course it was NOT in f_c_string. It was not even in the rename routine or its brethren. It was in ismount_c() which is called slightly later in the program. The line parent = (char ) malloc(strlen(path)+3); should be: parent = (char ) malloc(strlen(path)+4); With that found, the crash makes sense - strlen() does not count the required NUL character. So the construction of the parent will cause an ever so slightly memory overflow.
good catch. My C knowledge is quite bad, it's almost 15 years since I've learned it at university
It is one of the easiest mistakes to make in C - I have had my share of them. Which is one reason I do like Fortran :).
Op vr 31 jul. 2020 om 13:22 schreef Martin Diehl notifications@github.com:
Of course it was NOT in f_c_string. It was not even in the rename routine or its brethren. It was in ismount_c() which is called slightly later in the program. The line parent = (char ) malloc(strlen(path)+3); should be: parent = (char ) malloc(strlen(path)+4); With that found, the crash makes sense - strlen() does not count the required NUL character. So the construction of the parent will cause an ever so slightly memory overflow.
good catch. My C knowledge is quite bad, it's almost 15 years since I've learned it at university
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/201#issuecomment-667071899, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YRZKCVPG72TRLUKNRWLR6KSQJANCNFSM4NOSR3IA .
@MarDiehl Looks good to me. I'd propose to use dir
instead of directory
though (consistently everywhere) as "directory" is a very long word...
@certik I really like your idea as it almost 100% matches the rules on using dashes in composed nouns in my native language (Hungarian) :smile: . However, I think, it would be still confusing for newcomers. Also, it does not match current Fortran naming practice (e.g. type(c_ptr)
, move_alloc
, compiler_version
, character_kinds
, num_images
, etc).
I think an impure elemental
function to check if one or more files exist, "on the fly", is more convenient than using the inquire
statement directly, so I suggest adding a file_exists
function:
module m
implicit none
contains
impure elemental function file_exists(xfile) result(exists)
character (len=*), intent(in) :: xfile
logical :: exists
inquire(file=xfile,exist=exists)
end function file_exists
end module m
program main
! driver for file_exists
use m, only: file_exists
implicit none
print*,file_exists(["1","2","3"] // ".txt")
end program main
I think an
impure elemental
function to check if one or more files exist, "on the fly", is more convenient than using theinquire
statement directly, so I suggest adding afile_exists
function:module m implicit none contains impure elemental function file_exists(xfile) result(exists) character (len=*), intent(in) :: xfile logical :: exists inquire(file=xfile,exist=exists) end function file_exists end module m program main ! driver for file_exists use m, only: file_exists implicit none print*,file_exists(["1","2","3"] // ".txt") end program main
But this approach only works if all character strings have the same length, otherwise it won't compile. Isn't that usecase too limited?
I think exists
is sufficient. It can be put in a loop if multiple files are checked.
While some functionality for file system related operations exist in Fortran, some rather relevant operations are not standardized. For example, figuring out whether a path is a directory: https://stackoverflow.com/questions/9522933
A function like "isDirectory" can be based on the corresponding C-functionality. Would that be an appropriate solution? I would certainly require a bunch of #ifdefs in the C-side of the implementation