BurntSushi / rust-csv

A CSV parser for Rust, with Serde support.
The Unlicense
1.72k stars 219 forks source link

[Question] Seeking to first record in the file #313

Closed patataofcourse closed 1 year ago

patataofcourse commented 1 year ago

What version of the csv crate are you using?

1.2.1

Briefly describe the question, bug or feature request.

I would like to know if some way of resetting a csv::Reader, so that it reads back again from the first row, is possible in the current CSV crate, or if it could be implemented in a future version of the crate

Include a complete program demonstrating a problem.

I have the symbols from a reverse-engineered program in a .csv format, sorted by location, and a program that takes a raw position in memory and tries to return the specific symbol that location corresponds to. However, if you give it two symbols, one from later in the file and one from earlier, the program will start iterating not from the start of the file but from the middle of it.

Omitting a lot of steps like bound checking, etc:

symbols.csv

"name","location"
"func_00010000","65536"
"func_00010100","65792"

main.rs


use csv::Reader;
use std::fs::File;

#[derive(Deserialize, Serialize)]
struct CsvSymbol {
  name : String,
  location: u32
}

fn find_symbol(reader: &mut Reader<File>, pos: u32) -> (u32, String) {
    let mut iter = reader.deserialize::<CsvSymbol>();
    let mut out = (0, String::new());
    for symbol in iter {
        let symbol = symbol.unwrap();
        if symbol.location > pos {
            break
        }
        out = (symbol.pos, symbol.location.clone());
    }
    Ok(out)
}

fn main() {
    let mut builder = ReaderBuilder::new();
    builder.trim(Trim::Fields);
    builder.has_headers(true);
    let mut reader = builder.from_path("symbols.csv").unwrap();
    println!("Symbol 1: {:08x?}", find_symbol(&mut reader, 0x00010108).unwrap());
    println!("Symbol 2: {:08x?}", find_symbol(&mut reader, 0x00010008).unwrap());
}

Seeking to the beginning of the file won't work either - in that case the CSV reader detects the headers as fields and, as such, errors when it detects that the location is "Location" rather than a valid integer.

What is the observed behavior of the code above?

Symbol1: (00010100, "func_00010100")
Symbol2: (00000000, "")

What is the expected or desired behavior of the code above?

Symbol1: (00010100, "func_00010100")
Symbol2: (00010000, "func_00010000")
BurntSushi commented 1 year ago

Could you please follow the instructions in the template? The example you've provided is not a complete program because I cannot run it. For example, you have not provided a symbols.csv, which appears to be a required input. Please also include any relevant Cargo.toml changes.

You mention seeking, and you even mention that you get an error from doing it, but the code you've provided does not actually show what you've tried. In particular, have you see the example here? https://docs.rs/csv/latest/csv/struct.Reader.html#method.seek

patataofcourse commented 1 year ago

updated OP to include symbols.csv

yes, i've attempted to use Reader::seek, however, because of the headers being there, i can't just seek to byte 0, and seeking to line 1, line 2, record 0, etc still gave me that error. i've also attempted to use Reader::headers, then store the stream position of reader.get_ref(), then seek to that, but for some reason when i did that the stream_position got set to 8192, despite my file's headers being probably a couple tens of characters (there's an additional namespace field i skipped from this example for simplicity)

BurntSushi commented 1 year ago

Your program doesn't compile. There are oodles of errors. I can tell just from inspection of the source code.

I'm not going to keep doing a back-and-forth like this. Please provide something I can actually build.

patataofcourse commented 1 year ago

i'll just send you this directly _.zip

patataofcourse commented 1 year ago

...yknow what i'll just figure it out myself