fortran-lang / stdlib

Fortran Standard Library
https://stdlib.fortran-lang.org
MIT License
1.09k stars 169 forks source link

Read CSV files quickly #884

Open Beliavsky opened 1 week ago

Beliavsky commented 1 week ago

Since R is used for analyzing large data sets, there are several libraries for reading CSV files quickly, one of them being csvread, written in C++. It would be nice to have a fast CSV file reader in Fortran, even if it's just an interface to one in C or C++.

arjenmarkus commented 1 week ago

Tangential remark: A problem and a rather fundamental one that occurs with CSV files is that the format leaves a lot of freedom. Just think of the use of comma's, semicolons etc. You would need some detection mechanism or options. I do not know that library, but it should not be too hard to get something reasonably fast - these degrees of freedom are the main bottleneck I'd say.

Op di 5 nov 2024 om 14:46 schreef Beliavsky @.***>:

Since R is used for analyzing large data sets, there are several libraries for reading CSV files quickly, one of them being csvread https://cran.r-project.org/web/packages/csvread/index.html, written in C++. It would be nice to have a fast CSV file reader in Fortran, even if it's just an interface to one in C or C++.

— Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/884, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YR5IJ4JTBN5RT25X2ILZ7DDZRAVCNFSM6AAAAABRGSZ5CCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGYZTKNJRGMYDGNY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

sakamoti commented 1 week ago

I have never loaded large data, so I am not sure how fast it is for your use case, but as a package for reading CSV files in Fortran, there is csv-fortran. This package is also introduced on fortran-lang.org.

jalvesz commented 1 week ago

This could be developed using to_num_from_stream. The fastest apporach I have found to load large numeric ASCII files into memory is to load it fully into a single large string and then stream throught the string pointer.