Miserlou / omnihash

Hash files, strings, input streams and network resources in various common algorithms simultaneously
https://github.com/Miserlou/omnihash
MIT License
251 stars 23 forks source link

Chunk-hash inputs to support big files and hashing stdin. #7

Closed ankostis closed 8 years ago

ankostis commented 8 years ago

Thanks for this handy projects :+1:

I have refactor it with chunked-reading + tee, to support hashing unlimited sized files (e.g. full-disks). Notice that it for big files it's slow due to the number of algorithms running - not due to the un-slurping (chunking); usually chunking, almost counterintuitively, is faster on bigger file due to less memory hogging.

Also if no args given, it reads standard-input, following the UNIX philosophy.

further changes

Miserlou commented 8 years ago

Merged in manually - thanks very much, this is great!

I made a few small, modifications on top, please install the latest version in pip and confirm that it's behaving okay for you.

ankostis commented 8 years ago

Thanks @Miserlou.

I would like one more thing: eliminate the "-s sting in cmd-line option", now that it can read strings from stdin, because it is too resilient(!); think of that:

$ omnihash "`which foo`"
Hashing string ''...
DSA:                   da39a3ee5e6b4b0d3255bfef95601890afd80709
...

That's the empty string DSA, but unless you know it by hurt, it is impossible to diagnose easy to miss that foo file were not actually found:

$ "`which foo`"
: command not found

This behavior violates the pythonic rule I like the most: "fail early." But since it is about removing a functionality already present, you have to consent on that :-)