evoldoers / biomake

GNU-Make-like utility for managing builds and complex workflows
BSD 3-Clause "New" or "Revised" License
100 stars 9 forks source link

Ensure that database value of md5 is retracted when re-calculating hash. #76

Open cmungall opened 4 years ago

cmungall commented 4 years ago

Fixes #75 However, causes md5.time test to fail

cmungall commented 4 years ago

@ihh I'm not totally grokking the md5.time test which now fails, will try and look into this more later

ihh commented 4 years ago

I guess there are two possibilities: the test is now broken, or it was broken before and you just fixed it, but the "what it's meant to look like" file dates from when it was broken....

I'll look when I get a chance.....

rulatir commented 4 years ago

Boomp?

rulatir commented 4 years ago

Any chance this might be looked at again?

ihh commented 4 years ago

@cmungall The purpose of this test (test-159) was to check that biomake would recompute the MD5 hash if the file in which the hash was cached had an older timestamp than the file whose hash was supposedly cached there. The correct behavior in this case is to recompute the hash but NOT rebuild the file (it is not the file that is incorrect, but the hash).

This particular test-159 tries to check this behavior as follows:

Internally, a call to update_md5_file to recompute the hash is triggered from read_md5_file when the hash file's timestamp is seen to be old. This occurs correctly. What seems to be happening that is INCORRECT is that the file "world" is then getting rebuilt anyway. This didn't matter before because the rebuild was (buggily) not updating the Prolog database's copy of the MD5 hash. So the bug was being masked. What happens as a result of your patch (which forces update_md5_file to retract the hash) is that the accidental rebuild of the target then causes "hello_world" to be rebuilt as well.

The problem is exacerbated by a poorly designed test. If the initial value of "hello_world" was not "wrong_wrong" but something else, then the problem would have been exposed earlier.

You can see debug messages to this effect turning on the --debug md5 option. E.g. to run this test manually from the top-level directory:

mkdir t/mytest
cd t/mytest
cp ../ref/md5.time/* .
echo wrong > hello
echo wrong_wrong > hello_world
sleep 1
cp ../target/md5.checksums/* .biomake/md5
sleep 1
echo wrong > world
/Users/yam/biomake/bin/biomake --debug md5 -H hello_world

Do this with and without your extra retract_md5_hash line, and you will see a line echo world >world in both cases, but only with your retract_md5_hash does this cause the hash to be recomputed.

I now need to figure out why update_md5_hash is causing a rebuild of the target.

ihh commented 4 years ago

The more I think about it, the more I think that rebuilding the target file actually is the correct behavior if its MD5 hash file turns out to be stale. It should only not rebuild the file if the MD5 hash, when recomputed, turns out to be the same as the cached value in the file. So I updated test-159 to reflect this scenario (recomputed hash is the same as what was already stored), and added a new test (test-160) for the converse situation when the recomputed hash is different from what was stored, triggering a rebuild.

@cmungall if you could add a test confirming that this fixes #75, that'd be great

In the meantime @rulatir I think this should address your needs, hopefully. Thanks for your persistence.