Closed ericjbohm closed 3 months ago
migration/Makefile and creduce/Makefile seem to have a similar issue.
+p2 +vp1 is often the most simple way to reproduce a migration/PUP bug in AMPI. I don't have a problem with removing it from megampi so long as we still have it in tests/ampi/migration though. Also, +p2 +vp8 shouldn't add anything meaningful to the testing that +p2 +vp4 isn't already covering so I'd rather just remove the +p2 +vp1 and not add anything new.
+p2 +vp1 is often the most simple way to reproduce a migration/PUP bug in AMPI. I don't have a problem with removing it from megampi so long as we still have it in tests/ampi/migration though. Also, +p2 +vp8 shouldn't add anything meaningful to the testing that +p2 +vp4 isn't already covering so I'd rather just remove the +p2 +vp1 and not add anything new.
I finally got back to looking at this. There seems to be an issue with this independent from anything going on in CXI. A multicore build running on the headnode of frontier has a similar crash in the pieglobals-f90 +p1 +vp2 +balancer RandCentLB.
So, I think there is a more fundamental issue with how that interacts with newer toolchains. For reference, that bug was with PrgEnv-gnu/8.3.3 loaded. AMPI is of course totally unstable with the cray compiler, but even GNU is having issues with this test.
pieglobals-f90 +p1 +vp2 +balancer RandCentLB
I assume you mean +p2 +vp1
I had initially thought it to be the same problem, but it seems to be more general than +p2 +vp1, or there may be more than one problem with AMPI virtualization, as this is +p1 +vp2.
This bug is: munmap_chunk(): invalid pointer
On Thu, May 16, 2024 at 10:43 AM Sam White @.***> wrote:
pieglobals-f90 +p1 +vp2 +balancer RandCentLB
I assume you mean +p2 +vp1
— Reply to this email directly, view it on GitHub https://github.com/UIUC-PPL/charm/pull/3802#issuecomment-2115589155, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3HFHZKP4JYMCZPDPUYRXDZCTH2DAVCNFSM6AAAAABGBAHA7CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJVGU4DSMJVGU . You are receiving this because you authored the thread.Message ID: @.***>
closing this, will open an issue regarding some oddities with AMPI virtualiation
Fix an old typo in the makefile for the megaampi test.