ben-strasser / fast-cpp-csv-parser

fast-cpp-csv-parser
BSD 3-Clause "New" or "Revised" License
2.11k stars 440 forks source link

Improvement: handle a string where a number is expected #65

Closed ccocheci closed 6 years ago

ccocheci commented 6 years ago

Right now the parser cannot handle something like "nan" where an integer or double is expected. It would be nice to be able to pass a flag telling the parser to do something other than throw on exception in this case.

ben-strasser commented 6 years ago

Hi,

thanks for the bug report. What error payload do you expect "nan" to decode to?

Best Regards Ben Strasser

On 04/24/2018 05:41 PM, Cristian Cocheci wrote:

Right now the parser cannot handle something like "nan" where an integer or double is expected. It would be nice to be able to pass a flag telling the parser to do something other than throw on exception in this case.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ben-strasser/fast-cpp-csv-parser/issues/65, or mute the thread https://github.com/notifications/unsubscribe-auth/ALaAj-bmoHK-YbqFbIxOO4uMXIcXc9KTks5tr0edgaJpZM4Th7mU.

ccocheci commented 6 years ago

Hi Ben, It could be the default value for the type (0), or the max value for the type, depending on the flag value. Thank you, Cristian Cocheci

ben-strasser commented 6 years ago

Hi,

just so that I understand you correctly: You suggest that, if someone writes the value of sqrt(-1.0) to a file and then reads it back in as int the obtained value should be silently 0? That sounds like a recipe for disaster. If this really is the behavior that you want, I highly recommend making it explicit in your code and read the field as char* and use atof combined with an if-then-else.

What I actually was referring to, was that there is not just one nan value. A 32-bit IEEE floating point has 16777215 nan states. Which one should the string "nan" decode to? IEEE floating points nans contain an error that encodes the reason why something is nan. For example, 1/0.0 has a different bit pattern than sqrt(-1.0).

Best Regards Ben Strasser

On 04/25/2018 06:27 PM, Cristian Cocheci wrote:

Hi Ben, It could be the default value for the type (0), or the max value for the type, depending on the flag value. Thank you, Cristian Cocheci

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ben-strasser/fast-cpp-csv-parser/issues/65#issuecomment-384349018, or mute the thread https://github.com/notifications/unsubscribe-auth/ALaAj6EFgztdP0rnzY2S9JX3NKmRQx6Sks5tsKQCgaJpZM4Th7mU.

ccocheci commented 6 years ago

I was thinking that whatever nan decodes to would be the user's choice (depending on the value of the passed in flag). For example you could have a "nan_policy" with the values throw_on_nan (which would be the default policy), default_type_value_on_nan (which would translate nan to 0.0 or 0), set_to_max_on_nan, and maybe some others.

Thanks, Cristian

On Wed, Apr 25, 2018 at 3:37 PM, ben-strasser notifications@github.com wrote:

Hi,

just so that I understand you correctly: You suggest that, if someone writes the value of sqrt(-1.0) to a file and then reads it back in as int the obtained value should be silently 0? That sounds like a recipe for disaster. If this really is the behavior that you want, I highly recommend making it explicit in your code and read the field as char* and use atof combined with an if-then-else.

What I actually was referring to, was that there is not just one nan value. A 32-bit IEEE floating point has 16777215 nan states. Which one should the string "nan" decode to? IEEE floating points nans contain an error that encodes the reason why something is nan. For example, 1/0.0 has a different bit pattern than sqrt(-1.0).

Best Regards Ben Strasser

On 04/25/2018 06:27 PM, Cristian Cocheci wrote:

Hi Ben, It could be the default value for the type (0), or the max value for the type, depending on the flag value. Thank you, Cristian Cocheci

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ben- strasser/fast-cpp-csv-parser/issues/65#issuecomment-384349018, or mute the thread https://github.com/notifications/unsubscribe-auth/ ALaAj6EFgztdP0rnzY2S9JX3NKmRQx6Sks5tsKQCgaJpZM4Th7mU.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ben-strasser/fast-cpp-csv-parser/issues/65#issuecomment-384408628, or mute the thread https://github.com/notifications/unsubscribe-auth/AB4ZwReOXbxQd1iEliUJCsiMIhScVlxEks5tsNBlgaJpZM4Th7mU .

ben-strasser commented 6 years ago

Why not use

char*str_val; while(read_row(str_val)){   double val;   if(strcmp(str_val, "nan"))     val = atof(str_val);   else     val = 42; }

I do not think that you can make the code any clearer than that. The buildin decoder is also not really faster than that because the char* goes directly into the file buffer.

Best Regards Ben Strasser

On 04/26/2018 04:47 PM, Cristian Cocheci wrote:

I was thinking that whatever nan decodes to would be the user's choice (depending on the value of the passed in flag). For example you could have a "nan_policy" with the values throw_on_nan (which would be the default policy), default_type_value_on_nan (which would translate nan to 0.0 or 0), set_to_max_on_nan, and maybe some others.

Thanks, Cristian

On Wed, Apr 25, 2018 at 3:37 PM, ben-strasser notifications@github.com wrote:

Hi,

just so that I understand you correctly: You suggest that, if someone writes the value of sqrt(-1.0) to a file and then reads it back in as int the obtained value should be silently 0? That sounds like a recipe for disaster. If this really is the behavior that you want, I highly recommend making it explicit in your code and read the field as char* and use atof combined with an if-then-else.

What I actually was referring to, was that there is not just one nan value. A 32-bit IEEE floating point has 16777215 nan states. Which one should the string "nan" decode to? IEEE floating points nans contain an error that encodes the reason why something is nan. For example, 1/0.0 has a different bit pattern than sqrt(-1.0).

Best Regards Ben Strasser

On 04/25/2018 06:27 PM, Cristian Cocheci wrote:

Hi Ben, It could be the default value for the type (0), or the max value for the type, depending on the flag value. Thank you, Cristian Cocheci

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ben- strasser/fast-cpp-csv-parser/issues/65#issuecomment-384349018, or mute the thread https://github.com/notifications/unsubscribe-auth/ ALaAj6EFgztdP0rnzY2S9JX3NKmRQx6Sks5tsKQCgaJpZM4Th7mU.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub

https://github.com/ben-strasser/fast-cpp-csv-parser/issues/65#issuecomment-384408628, or mute the thread

https://github.com/notifications/unsubscribe-auth/AB4ZwReOXbxQd1iEliUJCsiMIhScVlxEks5tsNBlgaJpZM4Th7mU .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ben-strasser/fast-cpp-csv-parser/issues/65#issuecomment-384667822, or mute the thread https://github.com/notifications/unsubscribe-auth/ALaAj4S968B5nO1PHrShvByjNifpwVf0ks5tsd3vgaJpZM4Th7mU.

ccocheci commented 6 years ago

Sure, I can do this, of course. Indeed nan is too particular of a case, no reason to include it in the general options. Thank you. Cristian