Tests 66, 68, 69 fail on development branch on one machine but pass on another

jerett-cc commented 2 months ago

On the most recent version of the development branch, tests 66, 68, 69 fail on my school machine. But not my laptop.

At school, I am compiling on x86_64 Intel, with OpenMPI 4.1.4 On my laptop, I am compiling on similar architecture but different Intel CPU with OpenMPI 4.1.2

The diffs are:

66 shallow_water/verification-paraboloid_1d-erk33-l7.release:

----------------
##9       #:2   <== 1345.774540174449
##9       #:2   ==> 1345.846277059402
@ Absolute error = 7.1736884953e-2, Relative error = 5.3305277230e-5
----------------
##10      #:2   <== 0.0001164820398633047
##10      #:2   ==> 0.0001147281980001955
@ Absolute error = 1.7538418631e-6, Relative error = 1.5286929401e-2

68 shallow_water/verification-smooth_vortex-erk33-l6.release:

----------------
##10      #:2   <== 0.03571394823661699
##10      #:2   ==> 0.03737613586854281
@ Absolute error = 1.6621876319e-3, Relative error = 4.6541693484e-2
----------------
##11      #:2   <== 0.0006325612013505061
##11      #:2   ==> 0.0007562265440589786
@ Absolute error = 1.2366534271e-4, Relative error = 1.9549941167e-1
----------------
##12      #:2   <== 0.003420776846038435
##12      #:2   ==> 0.003488469908911931
@ Absolute error = 6.7693062873e-5, Relative error = 1.9788798253e-2

69 shallow_water/verification-steady_incline-erk33-l9.release:

----------------
##9       #:2   <== 1.000593578808362
##9       #:2   ==> 1.000320814661278
@ Absolute error = 2.7276414708e-4, Relative error = 2.7267666841e-4
----------------
##10      #:2   <== 2.388278346583212e-14
##10      #:2   ==> 0.002619689632507278
@ Absolute error = 2.6196896325e-3, Relative error = 1.0968946045e+11
----------------
##11      #:2   <== 4.287451614926996e-15
##11      #:2   ==> 0.0002049750897826185
@ Absolute error = 2.0497508978e-4, Relative error = 4.7808140636e+10
----------------
##12      #:2   <== 5.452329602107318e-15
##12      #:2   ==> 0.0006221684713152953
@ Absolute error = 6.2216847131e-4, Relative error = 1.1411057598e+11

I can show more of the output files. Not sure what information you may need most. Almost all the diffs stem from t being different in small ways. Let me know if there are other facts about the machines that may be relevant.

tamiko commented 2 months ago

@jerett-cc I am a bit worried about this one in your last comparison:

##11      #:2   <== 4.287451614926996e-15
##11      #:2   ==> 0.0002049750897826185
@ Absolute error = 2.0497508978e-4, Relative error = 4.7808140636e+10

These values should pretty much be zero and they aren't. Can you post the detailed.log file of the deal.II version that you compile against? Most importantly, do you compile with avx256 or avx512 support?

jerett-cc commented 2 months ago

@tamiko Yes, I can. detailed.log

As far as avx support, I believe so, but am not knowledgeable enough to tell you for sure, nor which one. I know I compile with -march-native

tamiko commented 2 months ago

@jerett-cc You are compiling with avx2 on this machine. Let me investigate - we had some weird behavior of some gcc version close to gcc-12 on other machines (with miscompilation).

conservation-laws / ryujin

Tests 66, 68, 69 fail on development branch on one machine but pass on another #170