a seeded sample not working properly when an index is present

BurntSushi / xsv

A fast CSV command line toolkit written in Rust.

The Unlicense

10.29k stars 317 forks source link

After looking at the code, this partially explains the issue https://github.com/BurntSushi/xsv/blob/3de6c04269a7d315f7e9864b9013451cd9580a08/src/cmd/sample.rs#L17-L19

basically, short circuiting the seed parameter.

I ran a bigger sample( more than 10% ) using the same tsv file (xsv sample --seed 42 5000000 file.tsv -o output2.csv), and I can confirm its now reproducible, with and without an index.

However, why does it ignore the seed parameter only when an index is present when the sample size is less than 10%? Shouldn't seed always takes precedence over the <10% sample size check?

BurntSushi / xsv

a seeded sample not working properly when an index is present #255