ben-strasser / fast-cpp-csv-parser

fast-cpp-csv-parser
BSD 3-Clause "New" or "Revised" License
2.15k stars 440 forks source link

do not overrun when quotes are unescaped #32

Closed yhager closed 7 years ago

yhager commented 7 years ago

When using double_quotes_escape, an unescaped quote can lead to an overrun if it appears at the end of the string.

For example, with the following code(test.cpp):

#include <iostream>
#include "csv.h"

int main(){
  io::CSVReader<3, io::trim_chars<' ', '\t'>, io::double_quote_escape<',', '\"'>> in("test.csv");
  in.read_header(io::ignore_extra_column, "vendor", "size", "speed");
  std::string vendor; int size; double speed;
  while(in.read_row(vendor, size, speed)){
    std::cout << "Vendor: " << vendor
              << ", size: " << size
              <<", speed: " << speed
              << std::endl;
    // do stuff with the data
  }
}

And the following csv file(test.csv):

"vendor","size","speed"
"escaped ""quotes""","1","2"
"unescaped "quotes"","1","2"

I get:

$ clang++ -std=c++11 test.cpp -lpthread -o test && ./test
Vendor: escaped "quotes", size: 1, speed: 2
Segmentation fault (core dumped)

With the proposed fix:

$ clang++ -std=c++11 test.cpp -lpthread -o test && ./test
Vendor: escaped "quotes", size: 1, speed: 2
Vendor: unescaped "quotes", size: 1, speed: 2
ben-strasser commented 7 years ago

Hi,

thanks for the bug report.

I agree that the library should SegFault. However, is

"unescaped "quotes"","1","2"

not illformed and an exception should be thrown?

Best Regards Ben Strasser

On 11/16/2016 04:19 PM, Yuval Hager wrote:

When using double_quotes_escape, an unescaped quote can lead to an overrun if it appears at the end of the string.

For example, with the following code(test.cpp):

|#include #include "csv.h" int main(){ io::CSVReader<3, io::trim_chars<' ', '\t'>, io::double_quote_escape<',', '\"'>> in("test.csv"); in.read_header(io::ignore_extra_column, "vendor", "size", "speed"); std::string vendor; int size; double speed; while(in.read_row(vendor, size, speed)){ std::cout << "Vendor: " << vendor << ", size: " << size <<", speed: " << speed << std::endl; // do stuff with the data } } |

And the following csv file(test.csv):

|"vendor","size","speed" "escaped ""quotes""","1","2" "unescaped "quotes"","1","2" |

I get:

|$ clang++ -std=c++11 test.cpp -lpthread -o test && ./test Vendor: escaped "quotes", size: 1, speed: 2 Segmentation fault (core dumped) |

With the proposed fix:

|$ clang++ -std=c++11 test.cpp -lpthread -o test && ./test Vendor: escaped "quotes", size: 1, speed: 2 Vendor: unescaped "quotes", size: 1, speed: 2 |


    You can view, comment on, or merge this pull request online at:

https://github.com/ben-strasser/fast-cpp-csv-parser/pull/32

    Commit Summary

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ben-strasser/fast-cpp-csv-parser/pull/32, or mute the thread https://github.com/notifications/unsubscribe-auth/ALaAj6Cnfv6tMG8LycGLLmg4NFWWIuzqks5q-x8KgaJpZM4Kz_NK.

yhager commented 7 years ago

Possibly. But the fix was so easy, so might as well let it through (unless you see a problem with the code)

ben-strasser commented 7 years ago

Hi,

the problem is that if I accept the patch as is now and change the behavior later to throwing an exception, then the later change will be a breaking change. :/

However, as I already said, a seg fault is not acceptable. Sometime has to change. I'm just not sure what yet.

Best Regards Ben

On 11/17/2016 01:45 PM, Yuval Hager wrote:

Possibly. But the fix was so easy, so might as well let it through (unless you see a problem with the code)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ben-strasser/fast-cpp-csv-parser/pull/32#issuecomment-261237874, or mute the thread https://github.com/notifications/unsubscribe-auth/ALaAjzMUiuq4C_eaDPQqxXtVJdUIDPXSks5q_Ex3gaJpZM4Kz_NK.

yhager commented 7 years ago

That's correct. Let me know if you want me to try and modify the patch so that it throws an exception instead. For my case, I need to keep parsing other lines, even if the current line is ill formed.

ben-strasser commented 7 years ago

Hi,

I merged the patch.

The reason is that currently strings such as "aa"aa" is read as aa"aa even though the quotation mark is not escaped. One can argue that this should throw an exception. However, it currently does not and throwing an exception would therefore potentially break existing code. Throwing an exception is therefore of the table and your patch is the only remaining solution that I can see.

Best Regards & Thanks for the patch Ben Strasser