Open gavinwahl opened 2 months ago
I can't remember, sadly. I think what that's referring to is that there may be cases where the byte offset is before what a human might consider to be the start of a CSV record when reading the CSV data, but that the byte offset is still correct assuming you use the csv
crate (or its underlying csv-core
implementation) to read the record for that position. This might sound weird, and that's because it is. For example, csv-core
ignores empty lines, so if you have:
foo,bar,baz
Then there are technically 2 valid byte offsets for the start of the foo,bar,baz
record: 0
or 1
(assuming \n
record delimiters). I think the language in the docs is just being a bit sneaky about not guaranteeing one or the other. It should (but doesn't) mention the fundamental invariant though: if you seek to that byte offset in the data and start the csv reader at that point, then you'll get the corresponding i
th record.
One piece of documentation says csv_index::RandomAccessSimple stores indices to byte offsets corresponding to the start of records, while another piece of documentation says it stores /approximate/ offsets. Which is it? If approximate, how would an approximate index be used to locate the actual start of a record?
exact: https://github.com/BurntSushi/rust-csv/blob/master/csv-index/src/lib.rs#L19 approximate: https://github.com/BurntSushi/rust-csv/blob/master/csv-index/src/simple.rs#L14