ashvardanian / StringZilla

Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging SWAR and SIMD on Arm Neon and x86 AVX2 & AVX-512-capable chips to accelerate search, sort, edit distances, alignment scores, etc 🦖
https://ashvardanian.com/posts/stringzilla/
Apache License 2.0
1.92k stars 64 forks source link

The wc cli has been fixed up #152

Open MarkReedZ opened 2 months ago

MarkReedZ commented 2 months ago

In the wc cli I fixed up a couple bugs, added the files from arg, justified the output to match wc, and sped up the longest line search.

ref https://github.com/ashvardanian/StringZilla/issues/97

$ time python ../StringZilla/cli/wc.py -L  --files0-from delme
        67 sz.js
     51648 enwik9.txt
        67 sz.js
       102 tst.js
     51884 total

real    0m1.184s
user    0m1.080s
sys 0m0.104s

$ time wc -L --files0-from delme
        67 sz.js
     51648 enwik9.txt
        67 sz.js
       102 tst.js
wc: ''$'\n': No such file or directory
     51648 total

real    0m4.327s
user    0m4.184s
sys 0m0.142s
ashvardanian commented 2 months ago

Does this relate to #139 in any way? cc @lborcard

MarkReedZ commented 2 months ago

No, haven't looked at split yet. I'm guessing for 139 he ran out of space. We can update the split code to print a better error message.