StanfordAHA / garnet

Next generation CGRA generator
BSD 3-Clause "New" or "Revised" License
106 stars 11 forks source link

Stop killing aha regressions #1090

Closed steveri closed 3 days ago

steveri commented 5 days ago

These changes are designed to address the aha regression-failure problem described in aha issue 1959 https://github.com/StanfordAHA/aha/issues/1959.

I discovered two pertinent errors, both related to the garnet CI script ci.yml, whose "docker cleanup" step does three things

The first error was in step (c), which is supposed to delete unused docker images. Previously, it did this with a command docker image prune --force. The --force arg was supposed to prevent the command from asking "do you really want to do this?" Unfortunately, it had the side effect of deleting all docker images whether or not someone was actively using the image (i.e. an aha full regression).

Also, the "older than 24 hours" is not very useful, since that pertains to the time the image was built, not the time it was downloaded. Many of the images that we use are days or even weeks old.

So I got rid of the --force option and instead used the yes command to answer the prompt. And I bumped the until requirement from 24 to 72 hours, even though that really doesn't help much of anything.

The second error relates to the docker-clean.sh script from item (1b) above, which was killing all containers more than 4 hours old. Since the containers in question include aha full-regressions that take as much as 19 hours to complete, this was obviously a bad idea. So now the docker-clean script waits at least 5 days before deciding that a container needs deleting.

steveri commented 5 days ago

I went ahead and deleted the useless "until" contingency for the docker prune command.