conservation-laws / ryujin

High-performance high-order finite element solver for hyperbolic conservation equations
https://conservation-laws.org
Other
103 stars 25 forks source link

Tests 66, 68, 69 fail on development branch on one machine but pass on another #170

Open jerett-cc opened 1 week ago

jerett-cc commented 1 week ago

On the most recent version of the development branch, tests 66, 68, 69 fail on my school machine. But not my laptop.

At school, I am compiling on x86_64 Intel, with OpenMPI 4.1.4 On my laptop, I am compiling on similar architecture but different Intel CPU with OpenMPI 4.1.2

The diffs are:

66 shallow_water/verification-paraboloid_1d-erk33-l7.release:

----------------
##9       #:2   <== 1345.774540174449
##9       #:2   ==> 1345.846277059402
@ Absolute error = 7.1736884953e-2, Relative error = 5.3305277230e-5
----------------
##10      #:2   <== 0.0001164820398633047
##10      #:2   ==> 0.0001147281980001955
@ Absolute error = 1.7538418631e-6, Relative error = 1.5286929401e-2

68 shallow_water/verification-smooth_vortex-erk33-l6.release:

----------------
##10      #:2   <== 0.03571394823661699
##10      #:2   ==> 0.03737613586854281
@ Absolute error = 1.6621876319e-3, Relative error = 4.6541693484e-2
----------------
##11      #:2   <== 0.0006325612013505061
##11      #:2   ==> 0.0007562265440589786
@ Absolute error = 1.2366534271e-4, Relative error = 1.9549941167e-1
----------------
##12      #:2   <== 0.003420776846038435
##12      #:2   ==> 0.003488469908911931
@ Absolute error = 6.7693062873e-5, Relative error = 1.9788798253e-2

69 shallow_water/verification-steady_incline-erk33-l9.release:

----------------
##9       #:2   <== 1.000593578808362
##9       #:2   ==> 1.000320814661278
@ Absolute error = 2.7276414708e-4, Relative error = 2.7267666841e-4
----------------
##10      #:2   <== 2.388278346583212e-14
##10      #:2   ==> 0.002619689632507278
@ Absolute error = 2.6196896325e-3, Relative error = 1.0968946045e+11
----------------
##11      #:2   <== 4.287451614926996e-15
##11      #:2   ==> 0.0002049750897826185
@ Absolute error = 2.0497508978e-4, Relative error = 4.7808140636e+10
----------------
##12      #:2   <== 5.452329602107318e-15
##12      #:2   ==> 0.0006221684713152953
@ Absolute error = 6.2216847131e-4, Relative error = 1.1411057598e+11

I can show more of the output files. Not sure what information you may need most. Almost all the diffs stem from t being different in small ways. Let me know if there are other facts about the machines that may be relevant.

tamiko commented 1 week ago

@jerett-cc I am a bit worried about this one in your last comparison:

##11      #:2   <== 4.287451614926996e-15
##11      #:2   ==> 0.0002049750897826185
@ Absolute error = 2.0497508978e-4, Relative error = 4.7808140636e+10

These values should pretty much be zero and they aren't. Can you post the detailed.log file of the deal.II version that you compile against? Most importantly, do you compile with avx256 or avx512 support?

jerett-cc commented 1 week ago

@tamiko Yes, I can. detailed.log

As far as avx support, I believe so, but am not knowledgeable enough to tell you for sure, nor which one. I know I compile with -march-native

tamiko commented 5 days ago

@jerett-cc You are compiling with avx2 on this machine. Let me investigate - we had some weird behavior of some gcc version close to gcc-12 on other machines (with miscompilation).