ben-strasser / fast-cpp-csv-parser

fast-cpp-csv-parser
BSD 3-Clause "New" or "Revised" License
2.12k stars 439 forks source link

How to check the columns for a fixed order? #143

Closed VoSubat closed 1 month ago

VoSubat commented 1 month ago

Usually the order of columns shouldn't matter in csv, I know. However, I am obliged to check if my n columns are the very first columns and if they are in a specific order. The only way I found was the exposure of the col_order with a const public function.

const std::vector<int>& get_column_order() const {
  return col_order;
}

Is there a better way? Is it worth to put such function into your official source code?

ben-strasser commented 1 month ago

You can use set_header instead of read_header to give the columns names based on their position in the file. If your file contains a broken header line that you want to ignore, the simplest way is to call next_line once after opening the file to completely skip the line.

VoSubat commented 1 month ago

Thank you a lot for you reply. It's actually the other way round - instead of ignoring defective headers, I have to detect and report them (including unexpected column order).

One way would be to read the header line, split it, trim it, un-quote it and then check the sequence. Afterwards I could use set_header and go one. But your library perfectly does all of this, I just have to check if col_order contains continuous increments.

Does it make sense to add functions to analyse the column order in your CSV parser? I'd totally understand if you said: no. :-)

ben-strasser commented 1 month ago

This seems too niche of an issue to add an extra function for everyone, so no. I do not want to add the function.

But why do you have to split? Can you not do this:

const char*actual_header = in.read_line();
const char*desired_header = "foo,bar,a,b,c";
if(!strcmp(actual_header, desired_header)){ /* log error*/ }
in.set_header("foo", "bar", "a", "b", "c");
...
VoSubat commented 1 month ago

Using the LineReader, I think I'd lose all the services I get from the CSVReader. And headers may be quoted and may have spaces. ID,Name,"Height,Width",Price is just as valid as "ID" , "Name" , "Height,Width" , "Price" but not Name,ID,"Height,Width",Price

Please close this "issue" - I'll keep my little change local - it really is not needed by anyone, but me. :-)

Thank you for you time - and thank you a lot for this perfect little parser.

ben-strasser commented 1 month ago

Fully agree, this is too special to have a generic function in the code that everyone gets.

Using the LineReader, I think I'd lose all the services I get from the CSVReader.

I'm uncertain whether you are aware but read_line is available in CSVReader exactly for the reason of being able to skip ill-formed lines.

VoSubat commented 1 month ago

I just found next_line which gives char* to the beginning of the entire line. But I'd still have to unpack the single column headers myself, because "Name" and Name and Name are all the same and must be accepted by my code. And read_header does all of this automatically.

No really, it's fine. :-)