AtChem / AtChem2

Atmospheric chemistry box-model for the MCM
MIT License
58 stars 23 forks source link

Size of the mechanism and speed of the simulation #436

Open rs028 opened 3 years ago

rs028 commented 3 years ago

The model is very slow when the entire MCM is used as chemical mechanism, as opposed to just a subset of it. It takes up to 5 days to run a 3 hours simulation (although this depends on the type of machine, obviously).

Possible cause: most of the species are zeros and/or unused.

spco commented 3 years ago

For my own education, could you confirm you're referring to the case where the .fac file is very large?

In that case, I'd hope there's something we can do, but I'm not sure right now. If most of the values are zero because the initial values of most of them are zero, then it's quite hard to identify that case in the code itself, I would think.

You'd need to identify which species must stay zero based upon the mechanism and initial conditions, to be able to remove them from the matrix. That's probably easier to do in pre-processing, except that it's dependent on the initial conditions, not just the .fac file. So your mechanism.f90 and mechanism.species etc would depend on the initial conditions as well as the .fac. That could well work happily, but you would need to recompile mechanism.so whenever you change the initial conditions, not just when you change the .fac (and you need to ensure that is done consistently etc).

Does that make sense? It sounds feasible but there are some design choices to be made, and it's a reasonable amount of work to get right.

(I guess the mechanism is in a way a directed graph, with the nodes being species, node-weights being the species concentration, edges the reactions, and the edge-weights the rates (but the rates can depend directly on other species (nodes)). I'm not sure how to best handle that.)

rs028 commented 3 years ago

Yes I think that's right. But to be honest the "zero values" hypothesis is just that. I don't know what controls the speed of the simulation. I guess we need to do some testing and use some sort of progiling tool.

That being said, I am intrigued by the "graph" option. It could have interesting applications, so we may want to keep it in mind in any case.

spco commented 3 years ago

On the general speed, have you experimented with changing the optimisation flag? gfortran defaults to -O0 (no optimisation) so building with -O2 might give some sizeable speedup. I have no feel for how much difference that would make in this specific case, but in general the speedup can be orders of magnitude.

rs028 commented 3 years ago

I haven't. This problem was actually reported by another user. I think the general point here is to understand why this happening. Let's say we run the test mechanism and the entire MCM with the same default initial conditions (model/configuration/initialConcentrations.config):

CH4    4.9e+13
CO     3.6e+12
O3     5.2e+11
NO2    2.4e+11

You would expect the difference in runtime to be not that much different, but it is not the case, and I think it may be good to know why. It may give some clues as to how to speed the model generally (ie, identify the bottlenecks).

spco commented 3 years ago

Sure, yes I don't doubt there's an issue there. The use of -O2 is probably useful regardless, and should in some ways be used as the 'default' as it is designed to be 'safe' optimisations only - those that can be guaranteed not to affect numerical precision etc. That would be the usual approach in most codebases, and might be worth considering putting as a default flag in the Makefile.skel - users can always modify if required. Thoughts?

rs028 commented 3 years ago

I honestly don't know enough to make a call here :) I think in general if we can speed it up without sacrificing accuracy it is a good thing.

Is it maybe all part of the same package of issues we generally called numeric stability? See notes at #265, https://github.com/AtChem/AtChem2/pull/340#issuecomment-416910159, https://github.com/AtChem/AtChem2/issues/384#issuecomment-494454524 etc...

spco commented 3 years ago

-O2 will do nothing to the numerics, so is 'safe' and won't have any effect on the numeric stability issues - it just takes a little longer to compile the executable. That is the only downside, (which is a very small one - our codebase is small anyway, so the compiler will have no difficulty compiling it to a higher optimisation level, and it would probably not be noticeable!)

rs028 commented 3 years ago

Then yes. I don't particularly care if the compilation is a tad longer :)

spco commented 3 years ago

I just tested it on my Mac - -O0 is about 2 seconds to compile, -O2 is about 4.4s.

Just to make sure there is a real effect, I ran some of the testcases (with many more steps than normal), and timed just the run (so ignoring the compile, which stays below 5s for all of them). I think for several of these, the I/O will be the limiting factor, which the optimisation does very little about, so the 1.2-3x speedups here would very likely actually be an underestimate of the speedup in more realistic setups.

-O2 -O0 speedup
static 26s 1m17s 2.96x
spec_yes_env+no_with_photo 26s 31s 1.19x
short_no_pre 23s 29.5s 1.28x
spec_yes_env_no_with_jfac_fixed 12.5s 32s 2.56x

I will open a PR to set the default to -O2 😄