Open patricebellan opened 7 years ago
I am also getting some odd results on a small csv file with a single column and forcing the number of jobs to 1.
Header
""
a
b
c
d
e
f
g
h
i
j
k
Running a few times xsv frequency -s Header t3.csv -j 1 | sort
gives me different results with 1 record missing. Not always the same record.
That happens both with 0.12.2 as installed by Homebrew and with a debug build of the current tip of the source code with rustc-1.19.0
I am attaching a screenshot of my terminal with those ad-hoc tests.
First run we are missing the value (NULL)
.
Second run we are missing the value e
.
When I debug the Frequencies I do see the proper cardinality of values and content.
// https://github.com/BurntSushi/xsv/blob/master/src/cmd/frequency.rs#L89
for (i, (header, ftab)) in head_ftables.enumerate() {
println!("{:?}: {:?}", &ftab, &ftab.cardinality());
let mut header = header.to_vec();
...
does print the expected hash-map and proper cardinality:
{[107]: 1, [102]: 1, [106]: 1, [97]: 1, [103]: 1, [105]: 1, []: 1, [98]: 1, [104]: 1, [100]: 1, [101]: 1, [99]: 1}: 12
It seems counts.into_iter()
(https://github.com/BurntSushi/xsv/blob/master/src/cmd/frequency.rs#L124) is where we loose some items.
I am a real newbie at rust and did not spend enough time to grasp what is going on there. I hope this helps.
I’ve also been surprised about some missing items when using frequency
until I found out about its --limit
option. By default, this option is set to 10 so you’ll never see more than 10 items.
Of course, this doesn’t explain all of the issues mentioned above but is hopefully helpful for some.
@sonicdoe Thanks — just got tripped up by this myself.
@BurntSushi IMHO if --limit
is going to eliminate some categories by default, it absolutely should not do so silently. For example, it could print + 12 equally-or-less frequent values with total count 12345678
after the table. If we then need another switch to suppress that message, so be it.
Hi,
I've been playing with
frequency
a bit, but got odd results, and here is what I found.Take this simple csv file
run
xsv frequency -s alpha,num
result: field,value,count alpha,b,2 alpha,a,2 num,2,2 num,1,2
then run
xsv frequency -s num,alpha
result: field,value,count num,a,2 num,b,2 alpha,1,2 alpha,2,2
I get the correct result only if I put the fields in the same order as the file header