Cantera / enhancements

Repository for proposed and ongoing enhancements to Cantera
11 stars 5 forks source link

Implement CSV file import for C++ SolutionArray #163

Open ischoegl opened 1 year ago

ischoegl commented 1 year ago

Abstract

Recent work added HDF support to Sim1D::save/restore (Cantera/cantera#1385) and implemented SolutionArray::save/restore for HDF and YAML (Cantera/cantera#1426). On the back-end, SolutionArray handles file IO in both cases. As those methods are implemented in the C++ layer, they are portable across all API's.

Adding CSV support to SolutionArray::save/restore in C++ to replace Python's SolutionArray.write_csv/read_csv is a logical extension. It can build on the existing infrastructure, and would be a good way of handling CSV support in a consistent way - which would replace the historically grown patchwork of dissimilar approaches used at the moment. One additional benefit would be to resolve Cantera/cantera#1372.

Motivation

Describe the need for the proposed change:

Possible Solutions

Create versions of SolutionArray::readEntry/writeEntry that handle CSV. While writing is straight-forward, reading CSV will need the implementation of a suitable parser. Per https://github.com/Cantera/cantera/issues/1372#issuecomment-1370177622 by @speth

[...] the C++ standard library now includes <regex> (which we use elsewhere) and has I believe essentially the same API as the Boost version. See https://en.cppreference.com/w/cpp/regex. Anything available in C++14 or older is fair game as far as Cantera is concerned.

References

ischoegl commented 1 year ago

Regarding parsing of CSV files in C++, here are some preliminary findings

bryanwweber commented 1 year ago

I feel like there has to be something in Boost to do this... Implementing a csv parser from scratch seems overkill 🤔

ischoegl commented 1 year ago

I feel like there has to be something in Boost to do this... Implementing a csv parser from scratch seems overkill 🤔

Wouldn't be too hard if this regex were supported by C++'s <regex>. It may, however, be supported by <boost/regex.hpp> ... lost most of my appetite after spending more time than what seemed necessary trying to figure out how to translate the conditional to EMCAScript.

speth commented 1 year ago

Not sure it would even resolve the problem, but I wanted to add a word of caution. boost/regex.hpp is a compiled part of Boost, which is something we've been avoiding a dependency on, due to some of the complications involved in linking to those.

ischoegl commented 1 year ago

Not sure it would even resolve the problem, but I wanted to add a word of caution. boost/regex.hpp is a compiled part of Boost, which is something we've been avoiding a dependency on, due to some of the complications involved in linking to those.

Too bad. I just confirmed that <boost/regex.hpp> would indeed resolve the problem 😢

PS: this is how to get the header line after opening the file ...

string line;
std::getline(file, line);

boost::regex rgx(
    "(?:^|,)(?=[^\"]|(\")?)\"?((?(1)[^\"]*|[^,\"]*))\"?(?=,|$)");
vector<string> labels;
auto line_begin = boost::sregex_iterator(line.begin(), line.end(), rgx);
auto line_end = boost::sregex_iterator();
for (boost::sregex_iterator item = line_begin; item != line_end; ++item) {
    boost::smatch match = *item;
    labels.push_back(match.str(2));
}

The syntax would be the same for <regex>, but the capturing string doesn't work.

speth commented 1 year ago

We could vendor this single file, header-only CSV reader: https://github.com/ben-strasser/fast-cpp-csv-parser, or something similar.