Closed sauclovian-g closed 2 months ago
In order to actually check that this fixes the problem, should we create an additional pull request to merge a version bump into this branch and see if the CI works there? (then close it) or is that not worthwhile? I did check that it locates the correct directory in my own checkout, which currently does have two versions' worth of build results in it.
Hm, it failed after reverting. So you can bump the version up, but not down. Is that good enough? The script appears to be doing what we intended, I'm guessing the problem is that reverting the version doesn't recompile/rerun enough things (whereas bumping it creates a new tree)...?
Not quite enough information in the logs to determine if it's a .tix
or .mix
file, but I did notice that the "Setup test environment" step prior to that unpacks an hpc.tar.gz
that it just downloaded as an artifact in the previous step. That tar unpack ends up with mix files from both your versions... and it also contains /bin/saw
. I wonder if the /bin/saw
is still from the new version and for whatever reason didn't get updated on your version rollback?
I will defer to you to decide what to do (my inclination would be to merge what we've got since it seems to be at least a partial step forward)... I don't know enough about how the tool works to have any useful input.
In a different context (#2118), I realized that we are inadvertently uploading the compiled saw-script
binary to the same name on both the hpc
and non-hpc
jobs, and depending on the order in which this happens, one could potentially clobber the other. I'm not sure if this could explain the oddities observed here, but I wonder if we should try again after landing #2118 (which will include a fix for this latent clobbering issue).
Worth a try, I think
ok, that worked, now rolling it back again
Nope, same result.
Two of the intTests timed out, too, not sure what to make of that
Two of the intTests timed out, too, not sure what to make of that
Something is very wrong here. The fact that the test_comp_bisim
test case in particular times out strikes me as suspicious, since that is a test that is expected to time out on the HPC configuration (see here), but not in the non-HPC configuration. And indeed, if I download the ubuntu-22.04-bins
artifact (which allegedly contains the non-HPC build of saw
) and run a simple command like saw --help
, it produces a .tix
file, as though it were built with HPC!
I'm really quite baffled as to how this happened, given that I haven't observed anything like what is going on in this PR happening in any other jobs (the nightly job in particular is passing without issues), and this PR doesn't change anything related to how artifacts are uploaded or downloaded.
Eeeenteresting, as they say... it must be something about this particular run and the state that's been left behind here, because this is the third run with the same tree, and the fifth if you don't count the version changes as different (and they should not be able to cause "real" problems...)
Sometimes I think every test run should both clone a fresh tree and build it and pull/update and rebuild the already-built tree from the previous run, and then diff them and fail if they aren't identical.
I am starting to suspect a cabal problem...
I would be somewhat surprised if state had something to do with this, given that GitHub Actions artifacts are tied to particular workflows. That being said, I am out of other plausible explanations. Perhaps we should try nuking the cache and trying again just to be extra sure.
Well, we know it's reusing the build dir or we wouldn't have seen the original behavior with two versions at once in dist-newstyle. That's definitely enough state for things to go off the rails if there's a problem in the rebuild logic somewhere.
I have nuked the cache and rebuilt from the current state of d283404. Let's see what happens when we try a round of version back-and-forth now...
Another thing that got suggested was to try bumping back to a version that already got built once (that is, 1.2.0.99 -> 1.2.1.50 -> 1.2.0.99 and then back to 1.2.1.50), and see if that gives other behavior that might be illuminating.
I find it puzzling that nuking the cache would make it able to revert a version bump. I think there must be something bad going on, but I think it's likely a cabal bug and unless we want to go chasing after it for upstream's benefit there's little to be gained by mucking around further here. (And if we do, the first thing is to try to replicate the behavior outside the CI environment, which I'm not really yet in a position to do right now.) So I'll merge.
However, I'm going to squash all the bump/unbump commits since we don't want those on the trunk forever.
(I also squashed the original review adjustments, no real reason to keep those separate; but I kept the original branch point and the merge with master as those might be relevant if someone comes back later.)
Previously it searched blindly and would get confused if there were build results from multiple saw-script versions. Now we use cabal list-bin to find the proper version and look there.
Fixes #2114.