MIT-LAE / APCEMM

Aircraft Plume Chemistry, Emissions, and Microphysics Model
MIT License
9 stars 17 forks source link

Inconsistent Results When "OpenMP Num Threads" is Greater Than 1 #19

Open Calebsakhtar opened 4 months ago

Calebsakhtar commented 4 months ago

There seems to be an issue with threads reading and writing parts of the memory at the same time.

Here are APCEMM outputs with two consecutive runs when using 8 threads: Run1 Run2

I was not able to recreate the random jumps with OpenMP Num Threads set to 1, but I was with more threads.

sdeastham commented 3 months ago

@Calebsakhtar - was this fixed by commit 85a56a3afdf3ac3bd06231d43202a629dc480e8b? More generally, do you still see this bug when running with >1 thread?

Calebsakhtar commented 3 months ago

@sdeastham Just to report that compiling APCEMM on commit https://github.com/MIT-LAE/APCEMM/commit/85a56a3afdf3ac3bd06231d43202a629dc480e8b still results in the above bug. I will now attempt compilation on the latest commit https://github.com/MIT-LAE/APCEMM/commit/618f20f2ddbcdeb62cf6fabdea66ddd477a1805b

Calebsakhtar commented 3 months ago

Here are the instructions to replicate the behaviour reported above:

  1. Clone the APCEMM git repo
  2. Follow the README installation instructions from the repo
  3. Run example 3

Please note that this behaviour has been observed in both Windows 11 Docker and on the Linux system of the Cambridge HPC.

Calebsakhtar commented 3 months ago

@sdeastham Just to report that compiling APCEMM on the latest commit https://github.com/MIT-LAE/APCEMM/commit/618f20f2ddbcdeb62cf6fabdea66ddd477a1805b still results in the above bug.

sdeastham commented 3 months ago

Thanks @Calebsakhtar ! To confirm, is that the result when outputting the standard "depth" variable directly or are you calculating a different kind of depth?

Calebsakhtar commented 3 months ago

@sdeastham The standard depth variable straight from APCEMM!

sdeastham commented 3 months ago

Got it! OK - issue is reproducible on our HPC (in fact, it looks much worse):

image

This seems to have the largest effect on these diagnostic variables. Prognostic variables like ice mass show very small differences (although these should still be nailed down, as they shouldn't happen for this case where there is in theory no randomness as temperature perturbation is disabled for example 3):

image

@michaelxu3 any thoughts you might have on origin would be appreciated! In any case, I'll try to drill down and see if there's an obvious cause of this behaviour.

@Calebsakhtar - can you confirm that this behaviour remains/disappears when:

sdeastham commented 3 months ago

@Calebsakhtar Also, was the profile you showed in the original post for Example 3 or for a different case? If it's example 3, that raises the question of why our profiles are so different (even setting aside the noise).

Calebsakhtar commented 3 months ago

@sdeastham The profile I showed was for one of the cases with my custom met conditions, not any of the examples. Sorry for not specifying this sooner.

Calebsakhtar commented 3 months ago

@sdeastham It will take me a while to confirm the other two cases, but at this time I can confirm that setting export OMP_NUM_THREADS=1 and specifying one core in the input.yaml file does result in the bug disappearing.

Calebsakhtar commented 2 months ago

@sdeastham Finally got around to finishing the HPC runs.

Here are the results:

sdeastham commented 2 months ago

Well, that is odd... thanks @Calebsakhtar ! I'll see if I can figure out what is going on.

Calebsakhtar commented 3 weeks ago

@sdeastham Sadly I have just rerun some cases with OpenMP Num Threads = 8 in the input.yaml file and it has resultied in the jumps being present in the extinction-defined width (standard APCEMM output)