Closed angus-g closed 1 year ago
Well that's annoying: I can't reproduce - it runs for me even in the exact same docker image!
Oh I see you've rerun it already on actions and it's green again. I'd say we just leave it at that and fingers crossed it doesn't come back - I have no idea otherwise....
I have seen another failure from that one, with a slightly different error message:
*** FAILED TO ADD ELEMENT TO BIGLST - FULL 3
*** ADPTVY: GOT ERROR FROM MKADPT
*** ADPTVY: FINISHED WITH ERROR -4
*** FLUIDITY ERROR ***
Source location: (Adapt_Integration.F90, 541)
Error message: Mesh adaptivity exited with an error
Backtrace will follow if it is available:
fluidity(fprint_backtrace_+0x38) [0x55d7750948c8]
fluidity(__fldebug_MOD_flabort_pinpoint+0x45) [0x55d77508e165]
fluidity(__adapt_integration_MOD_adapt_mesh+0x3bd5) [0x55d7753ffe15]
fluidity(+0x44849a) [0x55d77540149a]
fluidity(+0x4526b4) [0x55d77540b6b4]
fluidity(__adapt_state_module_MOD_adapt_state_multiple+0xc1) [0x55d77540d711]
fluidity(__adapt_state_module_MOD_adapt_state_first_timestep+0x4cf) [0x55d77540dc5f]
fluidity(__fluids_module_MOD_fluids+0x54c) [0x55d77508ff7c]
fluidity(mainfl+0x9b) [0x55d77508d83b]
fluidity(main+0x225) [0x55d7750847d5]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fb068819083]
fluidity(_start+0x2e) [0x55d7750862fe]
Use addr2line -e <binary> <address> to decipher.
Error is terminal.
I have encountered a very similar issue in one of my own simulations. The fluidity.err
file shows:
*** ADDELE: CANNOT HAVE ALL SIDES ON SURFACES
-13 -9 -1 -7
*** ADPTVY: GOT ERROR FROM MKADPT
*** ADPTVY: FINISHED WITH ERROR -98
*** FLUIDITY ERROR ***
Source location: (Adapt_Integration.F90, 541)
Error message: Mesh adaptivity exited with an error
Backtrace will follow if it is available:
/home/157/td5646/fluidity_particles_zoltan_MODULES/bin/fluidity(fprint_backtrace_+0x1a) [0x4c027a]
/home/157/td5646/fluidity_particles_zoltan_MODULES/bin/fluidity(__fldebug_MOD_flabort_pinpoint+0x3d) [0x4ba15d]
/home/157/td5646/fluidity_particles_zoltan_MODULES/bin/fluidity(__adapt_integration_MOD_adapt_mesh+0x3fbf) [0x8188ff]
/home/157/td5646/fluidity_particles_zoltan_MODULES/bin/fluidity() [0x81a99a]
/home/157/td5646/fluidity_particles_zoltan_MODULES/bin/fluidity() [0x8246b7]
/home/157/td5646/fluidity_particles_zoltan_MODULES/bin/fluidity(__adapt_state_module_MOD_adapt_state_multiple+0xb7) [0x826bf7]
/home/157/td5646/fluidity_particles_zoltan_MODULES/bin/fluidity(__adapt_state_module_MOD_adapt_state_first_timestep+0x481) [0x8270f1]
/home/157/td5646/fluidity_particles_zoltan_MODULES/bin/fluidity(__fluids_module_MOD_fluids+0x4f7) [0x4bbef7]
/home/157/td5646/fluidity_particles_zoltan_MODULES/bin/fluidity(mainfl+0x83) [0x4b7023]
/home/157/td5646/fluidity_particles_zoltan_MODULES/bin/fluidity(main+0x1e8) [0x4b1f48]
/lib64/libc.so.6(__libc_start_main+0xf3) [0x14db8b04dcf3]
/home/157/td5646/fluidity_particles_zoltan_MODULES/bin/fluidity(_start+0x2e) [0x4b364e]
Use addr2line -e <binary> <address> to decipher.
Error is terminal.
The associate fluidity.log
file finishes with:
Total receive_nodes = 1467
Exiting derive_maximal_element_halo
Leaving create_subdomain_mesh
Exiting strip_l2_halo
In adapt_mesh
Forming adaptmem arguments
Expected n/o elements: 6171
Calling adaptmem from adapt_mesh
Exited adaptmem
Integer working memory size: 433971
Real working memory size: 76736
Forming remaining adptvy arguments
Max. nodes: 89284
Number of locked nodes = 0
Calling adptvy from adapt_mesh
Checking consistency of elements...
Passed!
--- ADPTVY: Space set aside for about 12899 elements
--- ADPTVY: Space set aside for 4252 nodes
Setting up initial integer list pointers... 58046
Setting up initial real list pointers... 4252
Creating node list... F
Flagging halo nodes...
Creating 3D element and edge list...
Exited adptvy
Additionally, another fluidity.err
file shows the following right before the above is triggered:
+++ GMYBAD: Turned into geometry node
olded,newed: 9117 9123
node,othnd: 531 561
The associated fluidity.log
file finishes with:
--- Starting connect & move adapt of sweep 7 0.00000E+00
>>> Min/max edges: 1.77043799E-01 1.41363245E+01
>>> Min/max in-spheres: 1.34926060E-03 5.73772152E-01
>>> GLOBAL MESH FUNCTIONAL & element ave: 2.93895495E+09 4.12315034E+07
Top of BIGLST & NODLST: 16284 830
--- ADAPT1: Node movement in OVERSHOOT mode...
--- Info: sum of reductions: 0.00000000E+00
--- Info: total elems checked: 3
--- Info: total edges checked: 11
--- Info: total nodes checked: 6
Nodes: 735 Elements: 3286 Edges: 4267
BIGLST usage: 18.85% BIGLST efficiency: 86.74%
NODLST usage: 14.16% NODLST efficiency: 88.55%
--- ADPTVY: FINISHED ADAPTING MESH
TOPNOD,TOPBIG: 830 16284
NODS,ELEMS,SURFS: 735 3286 494
--- ADPTVY: Interpolating fields...
--- ADPTVY: Forming new fixed mesh data...
Finished new node data
Geom,split,int: 214 1 486
Finished new element data
Geom ed, int ed: 603 3526
Surface averages ( max id : 11 )
0 393 237.87295716415005 230.40268886712627 337.52219183996340
11 101 254.10476214847949 213.20371523184775 490.96253078458915
Working out new gather array...
Working out new scatter array...
Checking local node ordering...
VOLSUM,ASPAVE: 13098487010509200. 3.2794552789935869
Maximum asp,vol,rad: 189.67133119033468 13507195190.179401 32.448080400873366
Maximum vol,rad,asp: 41147049328376.836 11117.759818428784 1.5613890461593773
Minimum vol,rad,asp: 11662396813.644714 109.11753070490531 43.198464574201623
Minimum rad,vol,asp: 32.448080400873366 13507195190.179401 189.67133119033468
Finished checking local node ordering
So the ADDELE
error message indicates it's encountered a completely isolated element. I suspect this might happen if the partitioner does a bad job for some reason -although I'm not entirely sure how (I would have thought there should be some halo) - but it might be worth tweaking the partition settings. @rhodrid would you mind have a quick look at the partition options for that mphase_tephra_settling_3d case?
Sure thing. Will have a play tomorrow.
Get Outlook for iOShttps://aka.ms/o0ukef
From: Stephan Kramer @.> Sent: Tuesday, September 6, 2022 8:21:13 PM To: FluidityProject/fluidity @.> Cc: Subscribed @.***> Subject: Re: [FluidityProject/fluidity] Intermittent failure on mphase_tephra_settling_3d longtest (Issue #359)
So the ADDELE error message indicates it's encountered a completely isolated element. I suspect this might happen if the partitioner does a bad job for some reason -although I'm not entirely sure how (I would have thought there should be some halo) - but it might be worth tweaking the partition settings. @rhodridhttps://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Frhodrid&data=05%7C01%7Crhodri.davies%40anu.edu.au%7C3d49a7feebe049b2fe2908da8ff18883%7Ce37d725cab5c46249ae5f0533e486437%7C0%7C0%7C637980564783684236%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=zIWzTzhB5PIjeAfM2ngWP0qgbsmK6AH2h%2BJde0uG%2B7I%3D&reserved=0 would you mind have a quick look at the partition options for that mphase_tephra_settling_3d case?
— Reply to this email directly, view it on GitHubhttps://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FFluidityProject%2Ffluidity%2Fissues%2F359%23issuecomment-1237957668&data=05%7C01%7Crhodri.davies%40anu.edu.au%7C3d49a7feebe049b2fe2908da8ff18883%7Ce37d725cab5c46249ae5f0533e486437%7C0%7C0%7C637980564783684236%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ajMT9AUxRM0BdHZM4Y0aWkXGHFJtqeRnrMWHpa4AZqI%3D&reserved=0, or unsubscribehttps://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAB25UKRUEFEODEKPEYOWTGLV44LJTANCNFSM5232RJPA&data=05%7C01%7Crhodri.davies%40anu.edu.au%7C3d49a7feebe049b2fe2908da8ff18883%7Ce37d725cab5c46249ae5f0533e486437%7C0%7C0%7C637980564783684236%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=pSl6211eWkS0Lu2yIp0SWKIga%2F5G%2Fjklrbc5foqroWU%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.***>
So I can’t get this to fail locally. However, I have updated a few of the adaptivity settings which, for me at least, seems to improve partitioning and element quality (the latter only to a small extent).
Do you want me to create a pull request with these changes (to the longtest repo I assume), to see if they stabilise things?
R
On 6 Sep 2022, at 8:34 pm, Rhodri Davies @.**@.>> wrote:
Sure thing. Will have a play tomorrow.
Get Outlook for iOShttps://aka.ms/o0ukef
From: Stephan Kramer @.**@.>> Sent: Tuesday, September 6, 2022 8:21:13 PM To: FluidityProject/fluidity @.**@.>> Cc: Subscribed @.**@.>> Subject: Re: [FluidityProject/fluidity] Intermittent failure on mphase_tephra_settling_3d longtest (Issue #359)
So the ADDELE error message indicates it's encountered a completely isolated element. I suspect this might happen if the partitioner does a bad job for some reason -although I'm not entirely sure how (I would have thought there should be some halo) - but it might be worth tweaking the partition settings. @rhodridhttps://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Frhodrid&data=05%7C01%7Crhodri.davies%40anu.edu.au%7C3d49a7feebe049b2fe2908da8ff18883%7Ce37d725cab5c46249ae5f0533e486437%7C0%7C0%7C637980564783684236%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=zIWzTzhB5PIjeAfM2ngWP0qgbsmK6AH2h%2BJde0uG%2B7I%3D&reserved=0 would you mind have a quick look at the partition options for that mphase_tephra_settling_3d case?
— Reply to this email directly, view it on GitHubhttps://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FFluidityProject%2Ffluidity%2Fissues%2F359%23issuecomment-1237957668&data=05%7C01%7Crhodri.davies%40anu.edu.au%7C3d49a7feebe049b2fe2908da8ff18883%7Ce37d725cab5c46249ae5f0533e486437%7C0%7C0%7C637980564783684236%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ajMT9AUxRM0BdHZM4Y0aWkXGHFJtqeRnrMWHpa4AZqI%3D&reserved=0, or unsubscribehttps://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAB25UKRUEFEODEKPEYOWTGLV44LJTANCNFSM5232RJPA&data=05%7C01%7Crhodri.davies%40anu.edu.au%7C3d49a7feebe049b2fe2908da8ff18883%7Ce37d725cab5c46249ae5f0533e486437%7C0%7C0%7C637980564783684236%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=pSl6211eWkS0Lu2yIp0SWKIga%2F5G%2Fjklrbc5foqroWU%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.***>
I think you can push straight to main/master on longtests, so feel free to experiment. I have you have better settings I suggest you just push those and we'll just see if it is better behaved/stable in the long run.
Sadly I couldn’t push straight to master, but have set up a pull request into the longtests repo with my (minimal) changes for somebody to approve.
Note that the .flml files on a number of the long tests seem out of date with current schema, but this should probably be fixed in another pull request.
R
On 7 Sep 2022, at 11:51 pm, Stephan Kramer @.**@.>> wrote:
I think you can push straight to main/master on longtests, so feel free to experiment. I have you have better settings I suggest you just push those and we'll just see if it is better behaved/stable in the long run.
— Reply to this email directly, view it on GitHubhttps://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FFluidityProject%2Ffluidity%2Fissues%2F359%23issuecomment-1239420223&data=05%7C01%7Crhodri.davies%40anu.edu.au%7C553b0b839d7d4ea52c0408da90d81a94%7Ce37d725cab5c46249ae5f0533e486437%7C0%7C0%7C637981555078274561%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Rjjit2TxDzpUtb1RC0nCBMbZuiQONLG1L9x5RQxFE6E%3D&reserved=0, or unsubscribehttps://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAB25UKXWRTOVOYODICNTITLV5CMW7ANCNFSM5232RJPA&data=05%7C01%7Crhodri.davies%40anu.edu.au%7C553b0b839d7d4ea52c0408da90d81a94%7Ce37d725cab5c46249ae5f0533e486437%7C0%7C0%7C637981555078274561%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=smPfOGd%2B9KZyk%2B6B8vVGO8YSgQQMgPZPOVW5jibSlSA%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>
Assuming fixed by https://github.com/FluidityProject/longtests/pull/3
See actions run, error reproduced below: