Chunk-hash inputs to support big files and hashing stdin.

ankostis commented 8 years ago

Thanks for this handy projects :+1:

I have refactor it with chunked-reading + tee, to support hashing unlimited sized files (e.g. full-disks). Notice that it for big files it's slow due to the number of algorithms running - not due to the un-slurping (chunking); usually chunking, almost counterintuitively, is faster on bigger file due to less memory hogging.

Also if no args given, it reads standard-input, following the UNIX philosophy.

further changes

Stopped byte-encoding input files in utf-8 after reading - already in bytes.
Restructure main() with sub-functions, to reuse them for stdin.
msgs:
- Separate errors/debug (stderr) from usefull-output(stdout).
- Ident to make visible whch file gets processed.
Add nose into test-dependencies.

Miserlou commented 8 years ago

Merged in manually - thanks very much, this is great!

I made a few small, modifications on top, please install the latest version in pip and confirm that it's behaving okay for you.

ankostis commented 8 years ago

Thanks @Miserlou.

I would like one more thing: eliminate the "-s sting in cmd-line option", now that it can read strings from stdin, because it is too resilient(!); think of that:

$ omnihash "`which foo`"
Hashing string ''...
DSA:                   da39a3ee5e6b4b0d3255bfef95601890afd80709
...

That's the empty string DSA, but unless you know it by hurt, it ~~is impossible to diagnose~~ easy to miss that foo file were not actually found:

$ "`which foo`"
: command not found

This behavior violates the pythonic rule I like the most: "fail early." But since it is about removing a functionality already present, you have to consent on that :-)

Miserlou / omnihash

Chunk-hash inputs to support big files and hashing stdin. #7

further changes