Open stvdwtt opened 1 month ago
Ughh it passed on my machine. My guess is that due to numerical error, ArborX is giving us results we cannot deal with. Have you tried running in debug mode?
Yeah, debug mode is how I localized it. It fails with 27 ranks on the SCOPS-foundry-2 computer at the MDF
Get Outlook for iOShttps://aka.ms/o0ukef
From: Bruno Turcksin @.> Sent: Monday, September 9, 2024 2:32:58 PM To: adamantine-sim/adamantine @.> Cc: DeWitt, Stephen @.>; Author @.> Subject: [EXTERNAL] Re: [adamantine-sim/adamantine] Seg fault during covariance matrix construction for data assimilation only for particular numbers of MPI tasks (Issue #314)
Ughh it passed on my machine. My guess is that due to numerical error, ArborX is giving us results we cannot deal with. Have you tried running in debug mode?
— Reply to this email directly, view it on GitHubhttps://urldefense.us/v2/url?u=https-3A__github.com_adamantine-2Dsim_adamantine_issues_314-23issuecomment-2D2338920801&d=DwMFaQ&c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&r=cvKbjvbo_v3uDXaHX3YPi9Q4d2VzMcXorlcgE1fc2fY&m=tpwXtu1MSgy8tp1CNXYwo1zqoq9iOw5bOF5LrdiOof5VfC3tRtqDO_GbXRxOakx8&s=CKNt0vDlXMdZd4sdCS_EIgsLPddEZVdtqlaIXAFHgKQ&e=, or unsubscribehttps://urldefense.us/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ACQHZ5ZYCRL6Q6VATAH3UP3ZVXZWVAVCNFSM6AAAAABN3IY2EKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZYHEZDAOBQGE&d=DwMFaQ&c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&r=cvKbjvbo_v3uDXaHX3YPi9Q4d2VzMcXorlcgE1fc2fY&m=tpwXtu1MSgy8tp1CNXYwo1zqoq9iOw5bOF5LrdiOof5VfC3tRtqDO_GbXRxOakx8&s=UgzQHdHS9pKpwIR8EHvc730_AWxbbd75_H4Kbuthwn4&e=. You are receiving this because you authored the thread.Message ID: @.***>
Summary: An IMTS gear simulation crashes after data assimilation only for the case where 27 MPI processes are used. 9, 18, 30, and 60 all work.
Test case: https://github.com/adamantine-sim/demonstration-cases/tree/main/IMTS_Parts/GEAR-OP04
To run this, copy the input files out of
simulation_template
into its parentGEAR-OP04
directory and then run adamantine.We get a seg fault at this line: https://github.com/adamantine-sim/adamantine/blob/587a4ead79730ea7b30829346766f7e8bc781598/source/DataAssimilator.cc#L581
with an index that is greater than the number of degrees of freedom.
It's unclear whether the issue is in
indices_ranks
or ifj
has an invalid value.@Rombur, could you take a look at this?