ben-strasser / fast-cpp-csv-parser

fast-cpp-csv-parser
BSD 3-Clause "New" or "Revised" License
2.11k stars 440 forks source link

default value with set_header #82

Closed ncoder-1 closed 5 years ago

ncoder-1 commented 5 years ago

Hi,

I have a 3 column header-less CSV that looks like this:

# comment
a,b,c
c,a,b

1,2,3
red,orange,blue
tall,medium
apple,orange,tomatoes

I am attempting to read it with:

io::CSVReader<3, io::trim_chars<' '>, io::double_quote_escape<',', '\"'>, io::single_and_empty_line_comment<'#'>> csv_reader(filename);

csv_reader.set_header("Col A", "Col B", "Col C");

string col_a, col_b, col_c;

while (csv_reader.read_row(col_a, col_b, col_c)) {
  cout << col_a << " " << col_b << " " << col_c << "\n";
}

I get an exception thrown (io::error::too_few_columns) at the "tall,medium" line as it is missing a column. Is there a a way to add a default value (such as a simple null or "") for any missing column like it is possible with read_header? I saw this reported issue, but there wasn't really a solution other than adding a header in the file (I have no control on the file).

An example would be appreciated!

ben-strasser commented 5 years ago

Hi,

there is no way to properly solve this other than fixing the broken input file.

Consider for example you input file. Which column is missing in the "tall,medium" line? Is it the first, second, or third? This problem is further complicated when you consider automatic column reordering and selecting as csv_reader.read_header provides.

Best Regards Ben Strasser

On 25.07.2019 16:11, elimpnick wrote:

Hi,

I have a 3 column header-less CSV that looks like this:

comment

a,b,c c,a,b

1,2,3 red,orange,blue tall,medium apple,orange,tomatoes

I am attempting to read it with:

io::CSVReader<3, io::trim_chars<' '>, io::double_quote_escape<',', '\"'>, io::single_and_empty_line_comment<'#'>> csv_reader(filename);

csv_reader.set_header("Col A", "Col B", "Col C");

string col_a, col_b, col_c;

while (csv_reader.read_row(col_a, col_b, col_c)) { cout << col_a << " " << col_b << " " << col_c << "\n"; }

I get an exception thrown (io::error::too_few_columns) at the "tall,medium" line as it is missing a column. Is there a a way to add a default value for any missing column like it is possible with read_header? I saw this reported issue [1], but there wasn't really a solution other than adding a heading the file (I have no control on the file).

An example would be appreciated!

-- You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub [2], or mute the thread [3].

Links:

[1] https://github.com/ben-strasser/fast-cpp-csv-parser/issues/18 [2] https://github.com/ben-strasser/fast-cpp-csv-parser/issues/82?email_source=notifications&amp;email_token=AC3IBD3YPMU2VOG2HCMYQLLQBGYC5A5CNFSM4IG3CR42YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HBPXFQQ [3] https://github.com/notifications/unsubscribe-auth/AC3IBD6YIRBTZGGRZF4NM3TQBGYC5ANCNFSM4IG3CR4Q

ncoder-1 commented 5 years ago

Hi,

The way I see it, in the case of "tall,medium", the 3rd column is missing because the separator is there. I have it like that in my head:

tall,,medium #missing second column ,tall,medium #missing first column tall,medium #missing third column

Which would translate to: tall,NULL,medium NULL,tall, medium tall,medium,NULL

Even if you add a header, it won't make the missing data suddenly appear. To me, having the header in the file or hardcoded in the source code seems the same.

EDIT: I understand your thought process, but I just wished that you could set a column as "optional", maybe it's always minimum 2 columns and the 3rd one could be filed or not.

Cheers!

ncoder-1 commented 5 years ago

An example of such file is this wireshark manuf file, if you CTRL-F for "00:00:13", you will see an example of the first 2 columns being present and the 3rd missing.

ben-strasser commented 5 years ago

Hi,

if you have separators, then you should not get the error that you are seeing.

The following are clear:

tall,,medium ,tall,medium tall,medium

The following, which is your original example, is not:

tall,medium

I do not understand your usecase.

Best Regards Ben Strasser

ncoder-1 commented 5 years ago

If you look at the link above and you search for "00:00:13", you will see an example where there are 2 columns available, the 3rd is missing. If you look in the entire file, there is always 2 columns, and about 95% of all rows have a 3rd column with data. I am trying to replace the 5% left with "NULL" and not have an exception thrown.

ben-strasser commented 5 years ago

I still do not understand your point.

"a,b," is correctly parsed. That is not an issue. "a,b" cannot be correctly parsed as there is no way to tell which column is missing.

helmingstay commented 5 years ago

To the OP: It sounds like you want to grep each line, and add a trailing comma to lines that contain only 2 commas (or whatever your field separate is). As Ben has noted, this is outside the purview of the csv parser, but should be easily to accomplish with standard regex tools...

Best, Christian

On Thu, Jul 25, 2019 at 12:03 PM elimpnick notifications@github.com wrote:

If you look at the link above and you search for "00:00:13", you will see an example where there are 2 columns available, the 3rd is missing. If you look in the entire file, there is always 2 columns, and about 95% the 3rd column has data.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ben-strasser/fast-cpp-csv-parser/issues/82?email_source=notifications&email_token=AAEXA5IJNM7TBGPKWZ5C4GLQBHFFNA5CNFSM4IG3CR42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2Z54NI#issuecomment-515104309, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEXA5J37JKPJPAM2VHWLM3QBHFFNANCNFSM4IG3CR4Q .

-- A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal – Panama! http://www.x14n.org

ncoder-1 commented 5 years ago

Regex it is, thanks.