Fix the way compute-coverage finds the hpc dir in dist-newstyle

sauclovian-g commented 2 months ago

Previously it searched blindly and would get confused if there were build results from multiple saw-script versions. Now we use cabal list-bin to find the proper version and look there.

Fixes #2114.

sauclovian-g commented 2 months ago

In order to actually check that this fixes the problem, should we create an additional pull request to merge a version bump into this branch and see if the CI works there? (then close it) or is that not worthwhile? I did check that it locates the correct directory in my own checkout, which currently does have two versions' worth of build results in it.

sauclovian-g commented 2 months ago

Hm, it failed after reverting. So you can bump the version up, but not down. Is that good enough? The script appears to be doing what we intended, I'm guessing the problem is that reverting the version doesn't recompile/rerun enough things (whereas bumping it creates a new tree)...?

kquick commented 2 months ago

Not quite enough information in the logs to determine if it's a .tix or .mix file, but I did notice that the "Setup test environment" step prior to that unpacks an hpc.tar.gz that it just downloaded as an artifact in the previous step. That tar unpack ends up with mix files from both your versions... and it also contains /bin/saw. I wonder if the /bin/saw is still from the new version and for whatever reason didn't get updated on your version rollback?

sauclovian-g commented 2 months ago

I will defer to you to decide what to do (my inclination would be to merge what we've got since it seems to be at least a partial step forward)... I don't know enough about how the tool works to have any useful input.

RyanGlScott commented 2 months ago

In a different context (#2118), I realized that we are inadvertently uploading the compiled saw-script binary to the same name on both the hpc and non-hpc jobs, and depending on the order in which this happens, one could potentially clobber the other. I'm not sure if this could explain the oddities observed here, but I wonder if we should try again after landing #2118 (which will include a fix for this latent clobbering issue).

sauclovian-g commented 2 months ago

Worth a try, I think

sauclovian-g commented 2 months ago

ok, that worked, now rolling it back again

sauclovian-g commented 2 months ago

Nope, same result.

sauclovian-g commented 2 months ago

Two of the intTests timed out, too, not sure what to make of that

RyanGlScott commented 2 months ago

Two of the intTests timed out, too, not sure what to make of that

Something is very wrong here. The fact that the test_comp_bisim test case in particular times out strikes me as suspicious, since that is a test that is expected to time out on the HPC configuration (see here), but not in the non-HPC configuration. And indeed, if I download the ubuntu-22.04-bins artifact (which allegedly contains the non-HPC build of saw) and run a simple command like saw --help, it produces a .tix file, as though it were built with HPC!

I'm really quite baffled as to how this happened, given that I haven't observed anything like what is going on in this PR happening in any other jobs (the nightly job in particular is passing without issues), and this PR doesn't change anything related to how artifacts are uploaded or downloaded.

sauclovian-g commented 2 months ago

Eeeenteresting, as they say... it must be something about this particular run and the state that's been left behind here, because this is the third run with the same tree, and the fifth if you don't count the version changes as different (and they should not be able to cause "real" problems...)

Sometimes I think every test run should both clone a fresh tree and build it and pull/update and rebuild the already-built tree from the previous run, and then diff them and fail if they aren't identical.

I am starting to suspect a cabal problem...

RyanGlScott commented 2 months ago

I would be somewhat surprised if state had something to do with this, given that GitHub Actions artifacts are tied to particular workflows. That being said, I am out of other plausible explanations. Perhaps we should try nuking the cache and trying again just to be extra sure.

sauclovian-g commented 2 months ago

Well, we know it's reusing the build dir or we wouldn't have seen the original behavior with two versions at once in dist-newstyle. That's definitely enough state for things to go off the rails if there's a problem in the rebuild logic somewhere.

RyanGlScott commented 2 months ago

I have nuked the cache and rebuilt from the current state of d283404. Let's see what happens when we try a round of version back-and-forth now...

sauclovian-g commented 2 months ago

Another thing that got suggested was to try bumping back to a version that already got built once (that is, 1.2.0.99 -> 1.2.1.50 -> 1.2.0.99 and then back to 1.2.1.50), and see if that gives other behavior that might be illuminating.

sauclovian-g commented 2 months ago

I find it puzzling that nuking the cache would make it able to revert a version bump. I think there must be something bad going on, but I think it's likely a cabal bug and unless we want to go chasing after it for upstream's benefit there's little to be gained by mucking around further here. (And if we do, the first thing is to try to replicate the behavior outside the CI environment, which I'm not really yet in a position to do right now.) So I'll merge.

sauclovian-g commented 2 months ago

However, I'm going to squash all the bump/unbump commits since we don't want those on the trunk forever.

sauclovian-g commented 2 months ago

(I also squashed the original review adjustments, no real reason to keep those separate; but I kept the original branch point and the merge with master as those might be relevant if someone comes back later.)

GaloisInc / saw-script

Fix the way compute-coverage finds the hpc dir in dist-newstyle #2116