BurntSushi / rust-csv

A CSV parser for Rust, with Serde support.
The Unlicense
1.72k stars 219 forks source link

Memory Leak #370

Closed wspeirs closed 5 months ago

wspeirs commented 5 months ago

Thank you for taking the time to file a bug report. The following describes some guidelines to creating a minimally useful ticket.

Above all else: do not describe your problem, SHOW your problem.

What version of the csv crate are you using?

1.3.0

Briefly describe the question, bug or feature request.

When I run the following in a loop in an async context, it consumes all the memory on the box and crashes:

        let mut reader = ReaderBuilder::new().trim(Trim::All).has_headers(true).from_reader(csv.as_bytes());

        for res in reader.into_records() {
            if let Err(e) = res {
                panic!("Error reading record: {e}")
            }

            count += 1;
        }

If I comment out this section, it works as expected.

Include a complete program demonstrating a problem.

Reproducer: https://github.com/wspeirs/csv_repro The CSV file is excluded, but any file should work... mine is 30MB (~500,000 lines) with headers, and all ASCII, nothing fancy/crazy.

What is the observed behavior of the code above?

Consumes all the memory, crashes/killed

What is the expected or desired behavior of the code above?

To simply return the same count each time through the loop.

BurntSushi commented 5 months ago

Please provide a real reproducer. You don't give any command to run it and you don't seem to provide any input data.

wspeirs commented 5 months ago

Command to run: gzip -d file.csv.gz; cargo run --release Input file is now in the repository, but gzipped... you need to unzip it first.

BurntSushi commented 5 months ago

It's not a leak. And it has nothing to do with this crate. You've created a channel with a massive buffer size and are stuffing it full of a large amount of data. The channel is getting data pumped into it faster than it can process, apparently. So it fills up. Shrink the buffer size and the memory usage goes down. For something like this, your buffer size shouldn't be any bigger than the number of simultaneous workers. (I'd probably set it to 1 personally.)

Next time you have a leak, please submit a reproducer without tokio and message passing.