FluidityProject / fluidity

Fluidity
http://fluidity-project.org
Other
362 stars 113 forks source link

Intermittent failure on mphase_tephra_settling_3d longtest #359

Closed angus-g closed 1 year ago

angus-g commented 2 years ago

See actions run, error reproduced below:

*** ADDELE: CANNOT HAVE ALL SIDES ON SURFACES
        -778        -667        -445          -1
*** ADPTVY: GOT ERROR FROM MKADPT
*** ADPTVY: FINISHED WITH ERROR    -98
*** FLUIDITY ERROR ***
Source location: (Adapt_Integration.F90,  541)
Error message: Mesh adaptivity exited with an error
Backtrace will follow if it is available:
fluidity(fprint_backtrace_+0x38) [0x55bccf7ba868]
fluidity(__fldebug_MOD_flabort_pinpoint+0x45) [0x55bccf7b4105]
fluidity(__adapt_integration_MOD_adapt_mesh+0x3bd5) [0x55bccfb25cf5]
fluidity(+0x44837a) [0x55bccfb2737a]
fluidity(+0x452594) [0x55bccfb31594]
fluidity(__adapt_state_module_MOD_adapt_state_multiple+0xc1) [0x55bccfb335f1]
fluidity(__adapt_state_module_MOD_adapt_state_first_timestep+0x4cf) [0x55bccfb33b3f]
fluidity(__fluids_module_MOD_fluids+0x54c) [0x55bccf7b5f1c]
fluidity(mainfl+0x8c) [0x55bccf7b37ec]
fluidity(main+0x225) [0x55bccf7aa795]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7faebdf05083]
fluidity(_start+0x2e) [0x55bccf7ac2be]
Use addr2line -e <binary> <address> to decipher.
Error is terminal.
stephankramer commented 2 years ago

Well that's annoying: I can't reproduce - it runs for me even in the exact same docker image!

Oh I see you've rerun it already on actions and it's green again. I'd say we just leave it at that and fingers crossed it doesn't come back - I have no idea otherwise....

Patol75 commented 1 year ago

I have seen another failure from that one, with a slightly different error message:

*** FAILED TO ADD ELEMENT TO BIGLST - FULL 3
*** ADPTVY: GOT ERROR FROM MKADPT
*** ADPTVY: FINISHED WITH ERROR     -4
*** FLUIDITY ERROR ***
Source location: (Adapt_Integration.F90,  541)
Error message: Mesh adaptivity exited with an error
Backtrace will follow if it is available:
fluidity(fprint_backtrace_+0x38) [0x55d7750948c8]
fluidity(__fldebug_MOD_flabort_pinpoint+0x45) [0x55d77508e165]
fluidity(__adapt_integration_MOD_adapt_mesh+0x3bd5) [0x55d7753ffe15]
fluidity(+0x44849a) [0x55d77540149a]
fluidity(+0x4526b4) [0x55d77540b6b4]
fluidity(__adapt_state_module_MOD_adapt_state_multiple+0xc1) [0x55d77540d711]
fluidity(__adapt_state_module_MOD_adapt_state_first_timestep+0x4cf) [0x55d77540dc5f]
fluidity(__fluids_module_MOD_fluids+0x54c) [0x55d77508ff7c]
fluidity(mainfl+0x9b) [0x55d77508d83b]
fluidity(main+0x225) [0x55d7750847d5]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fb068819083]
fluidity(_start+0x2e) [0x55d7750862fe]
Use addr2line -e <binary> <address> to decipher.
Error is terminal.
Patol75 commented 1 year ago

I have encountered a very similar issue in one of my own simulations. The fluidity.err file shows:

*** ADDELE: CANNOT HAVE ALL SIDES ON SURFACES
         -13          -9          -1          -7
*** ADPTVY: GOT ERROR FROM MKADPT
*** ADPTVY: FINISHED WITH ERROR    -98
*** FLUIDITY ERROR ***
Source location: (Adapt_Integration.F90,  541)
Error message: Mesh adaptivity exited with an error
Backtrace will follow if it is available:
/home/157/td5646/fluidity_particles_zoltan_MODULES/bin/fluidity(fprint_backtrace_+0x1a) [0x4c027a]
/home/157/td5646/fluidity_particles_zoltan_MODULES/bin/fluidity(__fldebug_MOD_flabort_pinpoint+0x3d) [0x4ba15d]
/home/157/td5646/fluidity_particles_zoltan_MODULES/bin/fluidity(__adapt_integration_MOD_adapt_mesh+0x3fbf) [0x8188ff]
/home/157/td5646/fluidity_particles_zoltan_MODULES/bin/fluidity() [0x81a99a]
/home/157/td5646/fluidity_particles_zoltan_MODULES/bin/fluidity() [0x8246b7]
/home/157/td5646/fluidity_particles_zoltan_MODULES/bin/fluidity(__adapt_state_module_MOD_adapt_state_multiple+0xb7) [0x826bf7]
/home/157/td5646/fluidity_particles_zoltan_MODULES/bin/fluidity(__adapt_state_module_MOD_adapt_state_first_timestep+0x481) [0x8270f1]
/home/157/td5646/fluidity_particles_zoltan_MODULES/bin/fluidity(__fluids_module_MOD_fluids+0x4f7) [0x4bbef7]
/home/157/td5646/fluidity_particles_zoltan_MODULES/bin/fluidity(mainfl+0x83) [0x4b7023]
/home/157/td5646/fluidity_particles_zoltan_MODULES/bin/fluidity(main+0x1e8) [0x4b1f48]
/lib64/libc.so.6(__libc_start_main+0xf3) [0x14db8b04dcf3]
/home/157/td5646/fluidity_particles_zoltan_MODULES/bin/fluidity(_start+0x2e) [0x4b364e]
Use addr2line -e <binary> <address> to decipher.
Error is terminal.

The associate fluidity.log file finishes with:

Total receive_nodes = 1467
 Exiting derive_maximal_element_halo
 Leaving create_subdomain_mesh
 Exiting strip_l2_halo
 In adapt_mesh
 Forming adaptmem arguments
 Expected n/o elements:         6171
 Calling adaptmem from adapt_mesh
 Exited adaptmem
Integer working memory size: 433971
Real working memory size: 76736
 Forming remaining adptvy arguments
Max. nodes: 89284
 Number of locked nodes = 0
 Calling adptvy from adapt_mesh
Checking consistency of elements...
Passed!
--- ADPTVY: Space set aside for about    12899 elements
--- ADPTVY: Space set aside for      4252 nodes
 Setting up initial integer list pointers...       58046
 Setting up initial real list pointers...        4252
 Creating node list... F
 Flagging halo nodes...
 Creating 3D element and edge list...
 Exited adptvy

Additionally, another fluidity.err file shows the following right before the above is triggered:

+++ GMYBAD: Turned into geometry node
   olded,newed:        9117        9123
   node,othnd:          531         561

The associated fluidity.log file finishes with:

--- Starting connect & move adapt of sweep  7  0.00000E+00
>>> Min/max edges:   1.77043799E-01  1.41363245E+01
>>> Min/max in-spheres:   1.34926060E-03  5.73772152E-01
>>> GLOBAL MESH FUNCTIONAL & element ave:   2.93895495E+09  4.12315034E+07
Top of BIGLST & NODLST:   16284     830
--- ADAPT1: Node movement in OVERSHOOT mode...
--- Info: sum of reductions:  0.00000000E+00
--- Info: total elems checked:       3
--- Info: total edges checked:      11
--- Info: total nodes checked:       6

    Nodes:     735       Elements:    3286       Edges:    4267
   BIGLST usage:  18.85%   BIGLST efficiency:  86.74%
   NODLST usage:  14.16%   NODLST efficiency:  88.55%
--- ADPTVY: FINISHED ADAPTING MESH
TOPNOD,TOPBIG:     830   16284
NODS,ELEMS,SURFS:     735    3286     494
--- ADPTVY: Interpolating fields...
--- ADPTVY: Forming new fixed mesh data...
 Finished new node data
 Geom,split,int:          214           1         486
 Finished new element data
 Geom ed, int ed:          603        3526
 Surface averages  ( max id :          11  )
           0         393   237.87295716415005        230.40268886712627        337.52219183996340
          11         101   254.10476214847949        213.20371523184775        490.96253078458915
 Working out new gather array...
 Working out new scatter array...
 Checking local node ordering...
 VOLSUM,ASPAVE:    13098487010509200.        3.2794552789935869
 Maximum asp,vol,rad:    189.67133119033468        13507195190.179401        32.448080400873366
 Maximum vol,rad,asp:    41147049328376.836        11117.759818428784        1.5613890461593773
 Minimum vol,rad,asp:    11662396813.644714        109.11753070490531        43.198464574201623
 Minimum rad,vol,asp:    32.448080400873366        13507195190.179401        189.67133119033468
 Finished checking local node ordering
stephankramer commented 1 year ago

So the ADDELE error message indicates it's encountered a completely isolated element. I suspect this might happen if the partitioner does a bad job for some reason -although I'm not entirely sure how (I would have thought there should be some halo) - but it might be worth tweaking the partition settings. @rhodrid would you mind have a quick look at the partition options for that mphase_tephra_settling_3d case?

drhodrid commented 1 year ago

Sure thing. Will have a play tomorrow.

Get Outlook for iOShttps://aka.ms/o0ukef


From: Stephan Kramer @.> Sent: Tuesday, September 6, 2022 8:21:13 PM To: FluidityProject/fluidity @.> Cc: Subscribed @.***> Subject: Re: [FluidityProject/fluidity] Intermittent failure on mphase_tephra_settling_3d longtest (Issue #359)

So the ADDELE error message indicates it's encountered a completely isolated element. I suspect this might happen if the partitioner does a bad job for some reason -although I'm not entirely sure how (I would have thought there should be some halo) - but it might be worth tweaking the partition settings. @rhodridhttps://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Frhodrid&data=05%7C01%7Crhodri.davies%40anu.edu.au%7C3d49a7feebe049b2fe2908da8ff18883%7Ce37d725cab5c46249ae5f0533e486437%7C0%7C0%7C637980564783684236%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=zIWzTzhB5PIjeAfM2ngWP0qgbsmK6AH2h%2BJde0uG%2B7I%3D&reserved=0 would you mind have a quick look at the partition options for that mphase_tephra_settling_3d case?

— Reply to this email directly, view it on GitHubhttps://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FFluidityProject%2Ffluidity%2Fissues%2F359%23issuecomment-1237957668&data=05%7C01%7Crhodri.davies%40anu.edu.au%7C3d49a7feebe049b2fe2908da8ff18883%7Ce37d725cab5c46249ae5f0533e486437%7C0%7C0%7C637980564783684236%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ajMT9AUxRM0BdHZM4Y0aWkXGHFJtqeRnrMWHpa4AZqI%3D&reserved=0, or unsubscribehttps://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAB25UKRUEFEODEKPEYOWTGLV44LJTANCNFSM5232RJPA&data=05%7C01%7Crhodri.davies%40anu.edu.au%7C3d49a7feebe049b2fe2908da8ff18883%7Ce37d725cab5c46249ae5f0533e486437%7C0%7C0%7C637980564783684236%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=pSl6211eWkS0Lu2yIp0SWKIga%2F5G%2Fjklrbc5foqroWU%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.***>

drhodrid commented 1 year ago

So I can’t get this to fail locally. However, I have updated a few of the adaptivity settings which, for me at least, seems to improve partitioning and element quality (the latter only to a small extent).

Do you want me to create a pull request with these changes (to the longtest repo I assume), to see if they stabilise things?

R

On 6 Sep 2022, at 8:34 pm, Rhodri Davies @.**@.>> wrote:

Sure thing. Will have a play tomorrow.

Get Outlook for iOShttps://aka.ms/o0ukef


From: Stephan Kramer @.**@.>> Sent: Tuesday, September 6, 2022 8:21:13 PM To: FluidityProject/fluidity @.**@.>> Cc: Subscribed @.**@.>> Subject: Re: [FluidityProject/fluidity] Intermittent failure on mphase_tephra_settling_3d longtest (Issue #359)

So the ADDELE error message indicates it's encountered a completely isolated element. I suspect this might happen if the partitioner does a bad job for some reason -although I'm not entirely sure how (I would have thought there should be some halo) - but it might be worth tweaking the partition settings. @rhodridhttps://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Frhodrid&data=05%7C01%7Crhodri.davies%40anu.edu.au%7C3d49a7feebe049b2fe2908da8ff18883%7Ce37d725cab5c46249ae5f0533e486437%7C0%7C0%7C637980564783684236%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=zIWzTzhB5PIjeAfM2ngWP0qgbsmK6AH2h%2BJde0uG%2B7I%3D&reserved=0 would you mind have a quick look at the partition options for that mphase_tephra_settling_3d case?

— Reply to this email directly, view it on GitHubhttps://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FFluidityProject%2Ffluidity%2Fissues%2F359%23issuecomment-1237957668&data=05%7C01%7Crhodri.davies%40anu.edu.au%7C3d49a7feebe049b2fe2908da8ff18883%7Ce37d725cab5c46249ae5f0533e486437%7C0%7C0%7C637980564783684236%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ajMT9AUxRM0BdHZM4Y0aWkXGHFJtqeRnrMWHpa4AZqI%3D&reserved=0, or unsubscribehttps://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAB25UKRUEFEODEKPEYOWTGLV44LJTANCNFSM5232RJPA&data=05%7C01%7Crhodri.davies%40anu.edu.au%7C3d49a7feebe049b2fe2908da8ff18883%7Ce37d725cab5c46249ae5f0533e486437%7C0%7C0%7C637980564783684236%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=pSl6211eWkS0Lu2yIp0SWKIga%2F5G%2Fjklrbc5foqroWU%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.***>

stephankramer commented 1 year ago

I think you can push straight to main/master on longtests, so feel free to experiment. I have you have better settings I suggest you just push those and we'll just see if it is better behaved/stable in the long run.

drhodrid commented 1 year ago

Sadly I couldn’t push straight to master, but have set up a pull request into the longtests repo with my (minimal) changes for somebody to approve.

Note that the .flml files on a number of the long tests seem out of date with current schema, but this should probably be fixed in another pull request.

R

On 7 Sep 2022, at 11:51 pm, Stephan Kramer @.**@.>> wrote:

I think you can push straight to main/master on longtests, so feel free to experiment. I have you have better settings I suggest you just push those and we'll just see if it is better behaved/stable in the long run.

— Reply to this email directly, view it on GitHubhttps://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FFluidityProject%2Ffluidity%2Fissues%2F359%23issuecomment-1239420223&data=05%7C01%7Crhodri.davies%40anu.edu.au%7C553b0b839d7d4ea52c0408da90d81a94%7Ce37d725cab5c46249ae5f0533e486437%7C0%7C0%7C637981555078274561%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Rjjit2TxDzpUtb1RC0nCBMbZuiQONLG1L9x5RQxFE6E%3D&reserved=0, or unsubscribehttps://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAB25UKXWRTOVOYODICNTITLV5CMW7ANCNFSM5232RJPA&data=05%7C01%7Crhodri.davies%40anu.edu.au%7C553b0b839d7d4ea52c0408da90d81a94%7Ce37d725cab5c46249ae5f0533e486437%7C0%7C0%7C637981555078274561%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=smPfOGd%2B9KZyk%2B6B8vVGO8YSgQQMgPZPOVW5jibSlSA%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>

stephankramer commented 1 year ago

Assuming fixed by https://github.com/FluidityProject/longtests/pull/3