Updates for Focal (Ubuntu 20.04) including gcc10 and updated CI

tmbgreaves commented 3 years ago

This PR addresses:

Fixes for building on Ubuntu Focal and Ubuntu Groovy
CI migration to switch on testing on Focal and Groovy
Test fixes on Bionic as a result of CI migration
Test fixes for passing on Focal and Groovy

The substantial part of the PR is ready for review, with some (hopefully) minor test tweaks outstanding.

I'd welcome any thoughts and reviews on the CI migration from any/all developers - there is an ongoing push to try and move as many of the Fluidity codes as possible to Actions from Jenkins due to ageing Jenkins hardware and no clear way forward for the service. My goal is to try and get solid alternatives to Jenkins for all Fluidity codebases in place and switched to over the next month or two.

tmbgreaves commented 3 years ago

Thanks, @angus-g ! Feedback from your point of view as a developer on the Actions side is particularly appreciated. Rhod has the failing tests on his list to look at when time allows, hopefully over the next couple of weeks. I think the failures also exist in master, just not being tested, but it would be good to merge at a state where master tested with a full suite of passes.

tmbgreaves commented 3 years ago

Thank you @stephankramer ! Addressing your questions:

Do we care whether tests are in the short or medium test suite from the point of view of failures? My thinking there was that they're all tests in tests/ and if anything fails we're likely to debug by pulling the test container and re-running locally, at which point the test length isn't particularly relevant and it's more useful to just have a combined 'this is what failed on operating system X' report. I'm open to being persuaded otherwise :-)
The h5hut one is a horrible hack, agreed. I'm very open to a better fix there.
Completely agreed on wanting a third category of tests. I'd found that even the ~30 minute short test suite was quite long for functionality checking and had wondered about having a 'functionality' tag with just a basic set which covered the critical things that go wrong with installs that are broken in the standard ways, and might indeed be the default target for 'make test' for basic acceptance. My view on the suites is that 'short' doesn't necessarily mean that the suite runs in short time, but that it contains tests that run in a short time, likewise medium.

stephankramer commented 3 years ago

So my workflow is to always go back to the log at the CI interface first. Quite often it's immediately obvious what's wrong and I can push a quick fix. Of course ideally I'd always rerun the test locally to confirm it fixes it, but some tests are a bit long, require some cores that I don't have on my laptop...and then there's the aspect of test results being different locally and in whichever build environment it's failing. You're right that we always have the docker container to fall back on for reproducibility, but that tends to be my last resort really: it takes a fair while to download and development inside of it isn't ideal. Finally, an important aspect of looking at the logs online is that it allows you to compare the output between different runs. Sometimes the failure only gets noticed after a significant amount of development has passed. It can also help with intermittent failures...

So my ideal CI interface would have the entire test output right next to the failure message, but I'm also happy to continue my old routine of typing the test name in the search box of the relevant log, but then it does help to know which log I should be looking at. I'll have a look to see if I can fiddle with what we write into the junit output....

cianwilson commented 3 years ago

This looks really great @tmbgreaves ! Really appreciated being able to look through the github workflows setup.

One quick question, if we're pushing at least 3 tags to dockerhub per pull request (more if the pr results in extra commits), is there some limit on how much we can push? Are these tags auto-deleted eventually?

jrper commented 3 years ago

As a point of information for @stephankramer, what's appearing is indeed the JUnit test name (or rather the test case name), which is fairly easily mangleable. I think the information you want is stored as the test suite name, but it's too long since I looked at it to be sure. Depending on what Actions shows, we might be able to stick the log as the error message so it's more immediately visible, although I vaguely remember the testharness parallelism introducing a complication to that approach.

tmbgreaves commented 3 years ago

So my ideal CI interface would have the entire test output right next to the failure message, but I'm also happy to continue my old routine of typing the test name in the search box of the relevant log, but then it does help to know which log I should be looking at.

Very useful feedback, thank you @stephankramer . I think I can see how to easily split apart the short and medium junit reporting - will have a go at that now.

The JUnit reporting action is one I picked up from the marketplace, and might well be able to be improved significantly for what we're doing.

tmbgreaves commented 3 years ago

Ok - 3f57d07e4 changes the behavior of Actions back to the earlier option of failing the test run if the test run itself and/or the tests experience failure. This should make it easier to pinpoint where the relevant log files are. Once that run has gone through, @stephankramer , might you be able to take a look and give some feedback on whether it's an improvement?

The actual error is still in the logs, which I guess is no worse than the Jenkins situation, but very much agree that having it come out in the JUnit report would be ideal.

tmbgreaves commented 3 years ago

One quick question, if we're pushing at least 3 tags to dockerhub per pull request (more if the pr results in extra commits), is there some limit on how much we can push? Are these tags auto-deleted eventually?

I don't think there is a push limit - at least, not one I've come across. There is a pull limit for non-authenticated systems, of 100 pulls per six hours, which I think we're nowhere near. In terms of having lots of tags, I'm pushing to fluidity/actions which is apart from the normal fluidity/baseimages which the build containers go to, so hopefully the profusion of tags won't be a problem. My guess is that apart from Actions, the only pulls will be people doing a debugging run and grabbing a tag from the Actions logs to pull, rather than wading through the forest of tags in fluidity/actions .

I think dockerhub auto-deletes images which haven't been updates/accessed in six months (?) which should act as a backstop cleanup for us.

tmbgreaves commented 3 years ago

Possibly also worth adding for background - I had initially planned to just have the tests run in containers built per-test-run rather than building and cacheing on dockerhub, but ran into the annoying problem that github don't allow any dynamic content in that section, as per:

https://github.community/t/how-to-use-env-with-container-image/17252/4

So rather than use a github method, 'for security reasons' I end up using my own hacked-up method via dockerhub....

tmbgreaves commented 3 years ago

@stephankramer a run to review with the updated reporting:

https://github.com/FluidityProject/fluidity/actions/runs/779218872

Patol75 commented 3 years ago

Regarding why failed tests only display "Failure" as an annotation, I think this comes from testharness. The message argument of add_failure_info seems to be the one picked up and displayed in Action. For example, if you compare with unittestharness, message there contains an error message rather than just "Failure".

tmbgreaves commented 3 years ago

Noting @drhodrid 's proposed fixes in PR #313 as relevant to this PR.

stephankramer commented 3 years ago

@stephankramer a run to review with the updated reporting:

https://github.com/FluidityProject/fluidity/actions/runs/779218872

That's great @tmbgreaves , many thanks - all happy now

tmbgreaves commented 3 years ago

Attempting to open up the possibility of adding some longer tests in the standard run, 45327d8 pushes some long tests which should complete on github-hosted runners within the time that the mediumtest suite takes, so not slowing down the overall run assuming github gives enough resources to run everything at the same time. The benefit is having some longer tests run against PRs to master, the cost is complicating the set of outputs from the run.

@drhodrid - I'll also post this to your PR ( #313 ), but if you can add your tests which have gone over the mediumtesting length to the longtests repo and then add them in to the longtesting matrix set up in 45327d8 , I propose that's how we keep testing going but don't overall lengthen the runtime.

I've not yet tested the long examples but hope we can get many of them regularly tested against master with this method too.

Patol75 commented 3 years ago

As we are dealing with tests in this PR, is it normal that tests/python_shape_dshape_test and tests/python_shape_dshape_test_2d only have a Makefile in their directory? It looks like the Makefile is trying to access a src directory that does not exist.

stephankramer commented 3 years ago

Yes looks like these tests were removed in f5604ccb47c284aa470ab91e69d8a4a8af67a626 but their Makefiles were updated on a parallel branch in 06a11efe035ec821dc7aa23f12d4855b83adebd4 so got accidentaly added back in upon merge. Removed now...

tmbgreaves commented 3 years ago

@stephankramer - I'd very much value your thoughts on my suggested addition of some longtests to the standard master/PR runs, and how it's being done/presented.

tmbgreaves commented 3 years ago

Moved list of failing test to maintenance issue #316

tmbgreaves commented 3 years ago

The gcc-fixes-plus-longtests branch is currently processing the remaining prior-to-this-overhaul-untested failures, and I'll aim to trial a run of the vlong tests (probably via the old Jenkins builders) there too to complete the testing suite.

tmbgreaves commented 3 years ago

Testing of vlong tests initiated at : https://github.com/FluidityProject/longtests/actions

This will almost certainly be buggy at first. It's running on Jenkins hardware.

tmbgreaves commented 3 years ago

@drhodrid - Almost there with your tests, thank you! Last one, hopefully a tolerance issue, in particle_prescribed_vortex_flow running on Focal I'm seeing:

Max cons error 0.010923752666083475

against a test of 'less than 0.01'. Can this be relaxed to pass?

tmbgreaves commented 3 years ago

@stephankramer @jrper - wondering if you have any thoughts on what might be going on with the one wierd remaining short test failure of square-convection-parallel-trivial ?

What I'm seeing is that it fairly consistently segfaults when running in the short test suite on github runners but so far has never segfaulted in the same container run on my local system. I've tried setting it up on a separate repo and have had it both segfault and run successfully on github runners, sometimes on sequential runs in the same container. I've not yet managed to catch it segfaulting to get a backtrace out of it.

Does this touch anywhere unusual in the codebase that might be a culprit, or do anything unusual?

I'm out of ideas...

tmbgreaves commented 3 years ago

Question moved to issue #316

tmbgreaves commented 3 years ago

Question moved to issue #316

tmbgreaves commented 3 years ago

Question moved to issue #316

drhodrid commented 3 years ago

Hi Tim,

Yes - I think you can safely relax this one.

R

On 5 May 2021, at 8:22 am, Tim Greaves @.**@.>> wrote:

@drhodridhttps://github.com/drhodrid - Almost there with your tests, thank you! Last one, hopefully a tolerance issue, in particle_prescribed_vortex_flow running on Focal I'm seeing:

Max cons error 0.010923752666083475

against a test of 'less than 0.01'. Can this be relaxed to pass?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/FluidityProject/fluidity/pull/311#issuecomment-832287556, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB25UKVLGZH3RYOCKDG4B3DTMBXSFANCNFSM43L4RWTQ.

stephankramer commented 3 years ago

@tmbgreaves : re. running the longtest as well. Do I understand correctly all these are now separate jobs that in principle could run in paralllel? If so, yes sounds great - and it looks good from the interface. I guess it depends a bit whether this is going to affect the overall completion time before reaching the green light...

I'll take a look at square-convection-parallel-trivial to see if can reproduce...

tmbgreaves commented 3 years ago

@stephankramer - yes, that's exactly what I'm aiming for in terms of them running in parallel. At the moment I'm seeing a bit of slowdown on the run as a whole - perhaps adding ~2.5h to the already ~5h run, which may extend a little as the last few longtests/examples are fixed and added back in. The limit here is the 20 github runners we get at a time. My feeling is that it's worth it for the hopefully significantly better coverage we get by regularly running most of the longtests against PRs.

I think the time-to-green-light is probably not too significant on the timescale of getting code reviews etc.

There is also the 'vlong' suite of tests which I didn't manage to fit onto github runners, likely going to be on the order of 20 from examples and longtests, and which I'll aim to run via the longtests repo on self-hosted hardware in a similar weekly schedule to the way we used to run all longtests, but now in a much more controlled environment.

I'd suggest that if we can get to the point that square-convection-parallel-trivial is passing we move in the direction of final reviews and (hopefully) merging this PR with whatever tests are passing, given that should then be overall green, and we can then drip-feed in the remaining longtests as and when they're fixed.

stephankramer commented 3 years ago

All sounds good re. longtests - we're not in a situation where we have many active PRs in parallel - so that still sounds manageable

Re. the square-convection-parallel-trivial test case: I do have a backtrace now. It's segfaulting in the packing of detectors in parallel, line 455 of Detector_Tools.F90. Unfortunately it seems a bit heisenbuggy - if I rebuild with debugging it doesn't hit the segfault - and some of the variables I'd like to inspect are <optimized-out>. @drhodrid - in case it's relevant to failures you're looking at...

Slightly off-topic @tmbgreaves - but I was enjoying the flawless reproducability by running the relevant docker image. It seems the docker image is very bare bone (which is good!) - but it would be nice if the fluidity user could have sudo access so I can install stuff in a debugging session. At the moment it doesn't seem to have any editors, not even vi!

tmbgreaves commented 3 years ago

Thank you very much for the debugging on square-convection-parallel-trivial, @stephankramer . I don't think there's much I can do, but if there is, do shout.

I was waiting for someone to request more useful features in the container, @stephankramer , and being lazy implementing until someone did ;-) I'll add in sudo now - that's a very good idea.

tmbgreaves commented 3 years ago

Sudo available in containers from 48e1d2abb (ie, next build of this PR).

tmbgreaves commented 3 years ago

It's spurious pending fixes to square-convection-parallel-trivial but Actions has now completed a fully-passing run which took just under 7.5h, confirming the <3h additional time for the longtest suite running thus far.

Thoughts on going ahead with an Actions/Focal merge now given square-convection-parallel-trivial isn't a bug introduced by this merge, vs. holding on getting a fix there if it looks like a tractable problem to solve?

stephankramer commented 3 years ago

Yes, happy if you want to go ahead. Progress on square-convection-parallel-trivial in separate issue #317 I think it's a real issue, but likely already in master...

FluidityProject / fluidity

Updates for Focal (Ubuntu 20.04) including gcc10 and updated CI #311