APSIMInitiative / ApsimX

ApsimX is the next generation of APSIM
http://www.apsim.info
Other
130 stars 160 forks source link

Refactor the APSIM build system #8230

Open hol353 opened 1 year ago

hol353 commented 1 year ago

Describe the new feature

Problems:

Some related issues:

Design decisions

hol353 commented 1 year ago

Capturing some more thoughts:

When a pull request (PR) is raised, the following workflow is created by a tool/script written by us. The tool would scan .apsimx files and extract simulations that need running. All simulation runs will write to .csv files.

The workflow then needs to be executed somewhere. Workflow items below to happen asynchronously but with dependencies.

The workflow groups simulations in batches of 1000. This should be configurable.

workflow:

  1. build apsim
  2. push docker images.
  3. depends on 2: run apsim for 1000 simulations of wheat and send csv output to temp storage
  4. depends on 2: run apsim for 1000 simulations of wheat and send csv output to temp storage
  5. depends on 2: run apsim for 1000 simulations of wheat and send csv output to temp storage
  6. depends on 2: run apsim for 1000 simulations of wheat and send csv output to temp storage
  7. depends on 2: run apsim for 1000 simulations of wheat and send csv output to temp storage
  8. depends on 2: run apsim for 1000 simulations of wheat and send csv output to temp storage
  9. depends on 2: run apsim for 200 simulations of wheat and 800 simulation of barley send csv output to temp storage
  10. depends on 2: run apsim for 1000 simulations of barley and send csv output to temp storage
  11. depends on 2: run apsim for 1000 simulations of barley and send csv output to temp storage
  12. depends on 3-9: get csv values for wheat, build doc and send predicted-observed to POStats API.
  13. depends on 9-11: get csv values for barley, build doc and send predicted-observed to POStats API. ...
  14. depends on 3-13: create a Windows release of APSIM (using a linux container)
  15. depends on 3-13: create a Linux release of APSIM (using a linux container)
  16. depends on 3-13: create a OSX release of APSIM (using a linux container)
  17. depends on 14-16: Send pass/fail status flag to GitHub.
jbrider commented 1 year ago
peter-devoil commented 1 year ago

Even at 1200/month, you're still ahead over buying your own hardware every 3 years. Its a legitimate cost of operation, just like salaries.

If the 96 core machine is inadequate, you can spread the compute load across more of them. Setting up a shared network amongst a group of VMs isnt hard (in google compute or openstack) - so all simulations could see the same "disk" area, no need for complicated file transfer. You're already exploring methods to aggregate simulations; there shouldnt be a need to change output formats for that.

It's worth being sure there isn't a IO bottleneck here - only this morning I was spammed to buy a 192 core gaming machine. They're not far away..

Would like to think that we could do platform specific tests as well - eg to be sure the mac installer hasn't broken again..

And of course - are we sure that this testing is telling us something useful?

hol353 commented 12 months ago

Is it possibly to load a large number of these tests into memory and just change parameters before rerunning different configurations. For trials that only run for 1 season, it makes a huge difference.

Yep I've thought about this and agree it would be much quicker. It's a big job though converting our existing validation data sets (~12,000 simulations) into this way of running.

do we need to run 6000 wheat tests to know something changed? Can we rely on a smaller set?

We're not just looking for something to change. We're trying to convince ourselves that the model works and that a model stays validated across a broad range of GxExM. Most modellers I talk to what more tests, not less.

can we find a reliable way to separate gui changes from model changes?

Yes, this would be nice. Am I brave enough to say that any change to GUI code or documentation won't break a model validation?

can we add some performance testing to be able to compare runtimes to ensure the model isn't getting slower.

Yes, we need to do this!

Even at 1200/month, you're still ahead over buying your own hardware every 3 years. Its a legitimate cost of operation, just like salaries.

Agreed, we don't want to go back to buying our own hardware.

If the 96 core machine is inadequate, you can spread the compute load across more of them. Setting up a shared network amongst a group of VMs isnt hard (in google compute or openstack) - so all simulations could see the same "disk" area, no need for complicated file transfer. You're already exploring methods to aggregate simulations; there shouldnt be a need to change output formats for that.

True. Having 2 96 core machines will almost double the cost. They are much more expensive than say 200 dual core VMs. I'm not too worried about changing output file format. APSIM already supports CSV output via a command line switch. CSV is super easy to work with and upload to the POStats web api.

Would like to think that we could do platform specific tests as well - eg to be sure the mac installer hasn't broken again..

Agreed. I guess we need to write a test that installs the install to make sure it works.

And of course - are we sure that this testing is telling us something useful?

Agreed. It is telling us the models stay validated when we change something. It's a rather brute force way of doing that though. I do wonder if there is a simpler way as @jbrider alludes to above.

jbrider commented 12 months ago

@hol353 I'm not suggesting we don't need to run all 6000 at some stage - I agree in having more and better tests - just not part of every build.

lie112 commented 12 months ago

I had written a response last week but wasn't sure how much I understood the Git, Validation and Build process so deleted it. I too was thinking of how to reduce the load to just the builds that were needed. It was way too easy with no real understanding of the cost for a full rebuild pull request (@Resolves XXXX). I also thought GUI updates wouldn't really need to run full validation before build other than unit tests. I also thought there might be a way where pull requests that aren't critical but you also don't want to wait the unknown amount of time before someone else has a request resulting in a rebuild during quite times. If there was a regular (say Friday night) build that just did all outstanding commits we'd at least know all changes would be in the Friday night upgrade and could work to that schedule. This is equivalent to @Working on XXXX but we'd want the issue to be closed. Is there another Git tag we could use? For any code changes that are within Models.CLEM do we need to run all the wheat validations. Can we have a smarter way of knowing what namespaces fire what validation?

But that led me to realise that just to get a nod that code upgrades are be accepted we needed the whole validation to be run so we know it didn't break anything. I assume this process for each and every pull is what is contributing to the server CPU time. How often do changes result in a true validation fail on the build process?

Anyway, this isn't answer but might provide a few ways to approach this differently.