eudicots / Cactus

Static site generator for designers. Uses Python and Django templates.
BSD 3-Clause "New" or "Revised" License
3.47k stars 314 forks source link

use python's hashlib to create md5 checksum #112

Closed chaudum closed 9 years ago

chaudum commented 9 years ago

in order to be able to build website on linux too

krallin commented 9 years ago

Hey @chaudum, thanks for the suggestion,

I'll be a bit wary here since this actually used to work this way, but had changed.

cc @koenbok — do you know how much faster this had made the build? https://github.com/koenbok/Cactus/commit/fa1a08aed5bc6659304097d5ad7e653c553c1b11

If the speed gain is significant, I guess graciously falling back might be preferable (or possibly using md5sum).

Cheers,

chaudum commented 9 years ago

I changed it so that it uses md5 if available, otherwise fallback to md5sum.

krallin commented 9 years ago

I'm still a bit concerned considering that we're now creating two subprocesses (which, then the actual md5[sum] one) every time we're computing a checksum.

I feel we need to measure this to know what our speed gains are, otherwise we might end up wasting cycles trying to do the smart thing. I'll try to do that over the weekend, unless you'd like to get to it first?

Cheers,

koenbok commented 9 years ago

I actually did some tests with this for another project and it's less performant then something like this: http://stackoverflow.com/questions/7829499/using-hashlib-to-compute-md5-digest-of-a-file-in-python-3

So we should change it back. I initially did it because calculating md5s for big files took a ton of memory. Typical example of me trying to be smart and making it less good :-)

krallin commented 9 years ago

@koenbok Thanks! We should probably go with @chaudum's original commit then (https://github.com/chaudum/Cactus/commit/5de20d0cc2cb0f6164f1af10009a09ca9e42a72c)

@chaudum — mind updating the PR with just that one? : )

Thanks!

chaudum commented 9 years ago

@krallin @koenbok ok, will update the PR then

chaudum commented 9 years ago

made the change slightly more pythonic than the initial commit.

krallin commented 9 years ago

Guys,

Just a quick note on this. This change is probably not what we want. The following code:

    for buf in fp.read(65536):
        hasher.update(buf)

Doesn't do what we want for two reasons:

What we want probably looks a lot more like this:

     while True:
         buf = fp.read(65536)
         if not buf:
             break
        hasher.update(buf)

I found this issue while working on Python 3 compatibility (where iterating over bytestring returns integers, not 1-char bytestrings). ' might add failing test, and will fix the bug.

Cheers,