BurntSushi / rust-csv

A CSV parser for Rust, with Serde support.
The Unlicense
1.72k stars 219 forks source link

performance tweaks #312

Closed jqnatividad closed 3 months ago

jqnatividad commented 1 year ago

Hi @BurntSushi , thanks once again for this essential crate!

The proposed changes make for a non-trivial performance improvement.

running 44 tests
test count_game_deserialize_borrowed_bytes ... bench:   7,758,777 ns/iter (+/- 473,294) = 335 MB/s
test count_game_deserialize_borrowed_str   ... bench:   6,624,991 ns/iter (+/- 163,778) = 392 MB/s
test count_game_deserialize_owned_bytes    ... bench:  26,458,133 ns/iter (+/- 1,525,222) = 98 MB/s
test count_game_deserialize_owned_str      ... bench:  25,104,708 ns/iter (+/- 1,295,520) = 103 MB/s
test count_game_iter_bytes                 ... bench:  15,410,962 ns/iter (+/- 726,601) = 168 MB/s
test count_game_iter_str                   ... bench:  15,756,670 ns/iter (+/- 680,027) = 165 MB/s
test count_game_read_bytes                 ... bench:   4,219,750 ns/iter (+/- 165,703) = 616 MB/s
test count_game_read_str                   ... bench:   4,624,835 ns/iter (+/- 134,219) = 562 MB/s
test count_game_serialize_owned_bytes      ... bench:   5,778,877 ns/iter (+/- 172,116) = 380 MB/s
test count_game_serialize_owned_str        ... bench:   5,775,797 ns/iter (+/- 148,971) = 380 MB/s
test count_mbta_deserialize_borrowed_bytes ... bench:   1,576,512 ns/iter (+/- 110,897) = 458 MB/s
test count_mbta_deserialize_borrowed_str   ... bench:   1,230,144 ns/iter (+/- 54,333) = 588 MB/s
test count_mbta_deserialize_owned_bytes    ... bench:   3,246,426 ns/iter (+/- 127,478) = 222 MB/s
test count_mbta_deserialize_owned_str      ... bench:   3,235,632 ns/iter (+/- 219,046) = 223 MB/s
test count_mbta_iter_bytes                 ... bench:   1,872,583 ns/iter (+/- 69,757) = 386 MB/s
test count_mbta_iter_str                   ... bench:   1,916,454 ns/iter (+/- 79,710) = 377 MB/s
test count_mbta_read_bytes                 ... bench:     786,291 ns/iter (+/- 49,603) = 920 MB/s
test count_mbta_read_str                   ... bench:     854,391 ns/iter (+/- 24,702) = 846 MB/s
test count_mbta_serialize_owned_bytes      ... bench:     988,958 ns/iter (+/- 47,658) = 630 MB/s
test count_mbta_serialize_owned_str        ... bench:     988,226 ns/iter (+/- 35,463) = 630 MB/s
test count_nfl_deserialize_borrowed_bytes  ... bench:   2,701,055 ns/iter (+/- 82,730) = 505 MB/s
test count_nfl_deserialize_borrowed_str    ... bench:   2,245,416 ns/iter (+/- 89,691) = 607 MB/s
test count_nfl_deserialize_owned_bytes     ... bench:   4,762,779 ns/iter (+/- 173,791) = 286 MB/s
test count_nfl_deserialize_owned_str       ... bench:   4,741,035 ns/iter (+/- 409,210) = 287 MB/s
test count_nfl_iter_bytes                  ... bench:   2,272,528 ns/iter (+/- 150,358) = 600 MB/s
test count_nfl_iter_bytes_trimmed          ... bench:   5,391,668 ns/iter (+/- 538,262) = 253 MB/s
test count_nfl_iter_str                    ... bench:   2,429,517 ns/iter (+/- 63,474) = 561 MB/s
test count_nfl_iter_str_trimmed            ... bench:   7,920,283 ns/iter (+/- 144,182) = 172 MB/s
test count_nfl_read_bytes                  ... bench:   1,210,540 ns/iter (+/- 37,008) = 1127 MB/s
test count_nfl_read_str                    ... bench:   1,404,208 ns/iter (+/- 57,798) = 971 MB/s
test count_nfl_serialize_owned_bytes       ... bench:   1,751,947 ns/iter (+/- 57,874) = 778 MB/s
test count_nfl_serialize_owned_str         ... bench:   1,764,627 ns/iter (+/- 80,663) = 773 MB/s
test count_pop_deserialize_borrowed_bytes  ... bench:   3,122,547 ns/iter (+/- 107,516) = 306 MB/s
test count_pop_deserialize_borrowed_str    ... bench:   2,606,416 ns/iter (+/- 110,402) = 366 MB/s
test count_pop_deserialize_owned_bytes     ... bench:   6,050,581 ns/iter (+/- 223,288) = 157 MB/s
test count_pop_deserialize_owned_str       ... bench:   6,244,414 ns/iter (+/- 399,159) = 153 MB/s
test count_pop_iter_bytes                  ... bench:   3,576,439 ns/iter (+/- 292,036) = 267 MB/s
test count_pop_iter_str                    ... bench:   3,851,447 ns/iter (+/- 190,451) = 248 MB/s
test count_pop_read_bytes                  ... bench:   1,239,895 ns/iter (+/- 70,274) = 770 MB/s
test count_pop_read_str                    ... bench:   1,557,288 ns/iter (+/- 50,688) = 613 MB/s
test count_pop_serialize_owned_bytes       ... bench:   3,005,319 ns/iter (+/- 86,631) = 317 MB/s
test count_pop_serialize_owned_str         ... bench:   3,009,323 ns/iter (+/- 80,891) = 317 MB/s
test write_nfl_bytes                       ... bench:   1,322,566 ns/iter (+/- 107,061) = 1031 MB/s
test write_nfl_record                      ... bench:   1,705,117 ns/iter (+/- 134,554) = 800 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 44 measured; 0 filtered out; finished in 219.68s    

after changes:

running 44 tests
test count_game_deserialize_borrowed_bytes ... bench:   7,732,331 ns/iter (+/- 281,910) = 336 MB/s
test count_game_deserialize_borrowed_str   ... bench:   7,123,733 ns/iter (+/- 408,687) = 364 MB/s
test count_game_deserialize_owned_bytes    ... bench:  26,228,658 ns/iter (+/- 869,602) = 99 MB/s
test count_game_deserialize_owned_str      ... bench:  25,129,595 ns/iter (+/- 1,083,185) = 103 MB/s
test count_game_iter_bytes                 ... bench:  15,284,604 ns/iter (+/- 717,947) = 170 MB/s
test count_game_iter_str                   ... bench:  15,996,183 ns/iter (+/- 376,243) = 162 MB/s
test count_game_read_bytes                 ... bench:   4,169,133 ns/iter (+/- 56,233) = 623 MB/s
test count_game_read_str                   ... bench:   4,759,654 ns/iter (+/- 122,947) = 546 MB/s
test count_game_serialize_owned_bytes      ... bench:   5,703,414 ns/iter (+/- 62,806) = 385 MB/s
test count_game_serialize_owned_str        ... bench:   5,703,216 ns/iter (+/- 168,987) = 385 MB/s
test count_mbta_deserialize_borrowed_bytes ... bench:   1,569,433 ns/iter (+/- 29,807) = 460 MB/s
test count_mbta_deserialize_borrowed_str   ... bench:   1,258,864 ns/iter (+/- 24,781) = 574 MB/s
test count_mbta_deserialize_owned_bytes    ... bench:   3,239,921 ns/iter (+/- 131,520) = 223 MB/s
test count_mbta_deserialize_owned_str      ... bench:   3,240,303 ns/iter (+/- 209,404) = 223 MB/s
test count_mbta_iter_bytes                 ... bench:   1,862,000 ns/iter (+/- 62,607) = 388 MB/s
test count_mbta_iter_str                   ... bench:   1,920,648 ns/iter (+/- 147,367) = 376 MB/s
test count_mbta_read_bytes                 ... bench:     775,212 ns/iter (+/- 16,746) = 933 MB/s
test count_mbta_read_str                   ... bench:     859,529 ns/iter (+/- 17,496) = 841 MB/s
test count_mbta_serialize_owned_bytes      ... bench:     979,427 ns/iter (+/- 15,377) = 636 MB/s
test count_mbta_serialize_owned_str        ... bench:     980,864 ns/iter (+/- 32,903) = 635 MB/s
test count_nfl_deserialize_borrowed_bytes  ... bench:   2,708,353 ns/iter (+/- 78,623) = 503 MB/s
test count_nfl_deserialize_borrowed_str    ... bench:   2,226,037 ns/iter (+/- 18,871) = 613 MB/s
test count_nfl_deserialize_owned_bytes     ... bench:   4,712,045 ns/iter (+/- 108,790) = 289 MB/s
test count_nfl_deserialize_owned_str       ... bench:   4,667,893 ns/iter (+/- 238,920) = 292 MB/s
test count_nfl_iter_bytes                  ... bench:   2,230,472 ns/iter (+/- 88,307) = 611 MB/s
test count_nfl_iter_bytes_trimmed          ... bench:   5,251,656 ns/iter (+/- 289,425) = 259 MB/s
test count_nfl_iter_str                    ... bench:   2,324,214 ns/iter (+/- 93,508) = 587 MB/s
test count_nfl_iter_str_trimmed            ... bench:   7,626,133 ns/iter (+/- 222,011) = 178 MB/s
test count_nfl_read_bytes                  ... bench:   1,169,322 ns/iter (+/- 19,910) = 1167 MB/s
test count_nfl_read_str                    ... bench:   1,356,081 ns/iter (+/- 63,067) = 1006 MB/s
test count_nfl_serialize_owned_bytes       ... bench:   1,695,666 ns/iter (+/- 20,773) = 804 MB/s
test count_nfl_serialize_owned_str         ... bench:   1,703,076 ns/iter (+/- 35,033) = 801 MB/s
test count_pop_deserialize_borrowed_bytes  ... bench:   3,002,895 ns/iter (+/- 45,517) = 318 MB/s
test count_pop_deserialize_borrowed_str    ... bench:   2,528,691 ns/iter (+/- 70,270) = 377 MB/s
test count_pop_deserialize_owned_bytes     ... bench:   5,958,310 ns/iter (+/- 170,577) = 160 MB/s
test count_pop_deserialize_owned_str       ... bench:   6,159,256 ns/iter (+/- 219,707) = 155 MB/s
test count_pop_iter_bytes                  ... bench:   3,435,508 ns/iter (+/- 213,831) = 278 MB/s
test count_pop_iter_str                    ... bench:   3,744,129 ns/iter (+/- 142,677) = 255 MB/s
test count_pop_read_bytes                  ... bench:   1,197,293 ns/iter (+/- 22,227) = 798 MB/s
test count_pop_read_str                    ... bench:   1,558,262 ns/iter (+/- 34,463) = 613 MB/s
test count_pop_serialize_owned_bytes       ... bench:   2,908,832 ns/iter (+/- 48,941) = 328 MB/s
test count_pop_serialize_owned_str         ... bench:   2,907,679 ns/iter (+/- 48,869) = 328 MB/s
test write_nfl_bytes                       ... bench:   1,224,201 ns/iter (+/- 25,594) = 1114 MB/s
test write_nfl_record                      ... bench:   1,568,268 ns/iter (+/- 28,730) = 870 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 44 measured; 0 filtered out; finished in 180.05s
BurntSushi commented 1 year ago

I don't think any of these are performance tweaks? The one that might be is the use of copy_from_slice, but that isn't in any kind performance sensitive code. It's just the Clone impl for Writer. The rest all just look like code improvements.

Did you run the benchmarks multiple times? It's also not easy to see the actual comparison with your presentation here. Try using cargo-benchcmp.

jqnatividad commented 1 year ago

Thanks for the feedback!

I did run the benchmarks several times, but not in a rigorous way. Will use cargo-benchcmp and report back...