charmplusplus / charm

The Charm++ parallel programming system. Visit https://charmplusplus.org/ for more information.
Apache License 2.0
203 stars 49 forks source link

bugfix: change test to +p 2 +vp 8, because +p 2 +vp 1 is nonsense #3802

Closed ericjbohm closed 3 months ago

ericjbohm commented 5 months ago

Fix an old typo in the makefile for the megaampi test.

matthiasdiener commented 5 months ago

migration/Makefile and creduce/Makefile seem to have a similar issue.

stwhite91 commented 5 months ago

+p2 +vp1 is often the most simple way to reproduce a migration/PUP bug in AMPI. I don't have a problem with removing it from megampi so long as we still have it in tests/ampi/migration though. Also, +p2 +vp8 shouldn't add anything meaningful to the testing that +p2 +vp4 isn't already covering so I'd rather just remove the +p2 +vp1 and not add anything new.

ericjbohm commented 4 months ago

+p2 +vp1 is often the most simple way to reproduce a migration/PUP bug in AMPI. I don't have a problem with removing it from megampi so long as we still have it in tests/ampi/migration though. Also, +p2 +vp8 shouldn't add anything meaningful to the testing that +p2 +vp4 isn't already covering so I'd rather just remove the +p2 +vp1 and not add anything new.

I finally got back to looking at this. There seems to be an issue with this independent from anything going on in CXI. A multicore build running on the headnode of frontier has a similar crash in the pieglobals-f90 +p1 +vp2 +balancer RandCentLB.

So, I think there is a more fundamental issue with how that interacts with newer toolchains. For reference, that bug was with PrgEnv-gnu/8.3.3 loaded. AMPI is of course totally unstable with the cray compiler, but even GNU is having issues with this test.

stwhite91 commented 4 months ago

pieglobals-f90 +p1 +vp2 +balancer RandCentLB

I assume you mean +p2 +vp1

ericjbohm commented 4 months ago

I had initially thought it to be the same problem, but it seems to be more general than +p2 +vp1, or there may be more than one problem with AMPI virtualization, as this is +p1 +vp2.

This bug is: munmap_chunk(): invalid pointer

On Thu, May 16, 2024 at 10:43 AM Sam White @.***> wrote:

pieglobals-f90 +p1 +vp2 +balancer RandCentLB

I assume you mean +p2 +vp1

— Reply to this email directly, view it on GitHub https://github.com/UIUC-PPL/charm/pull/3802#issuecomment-2115589155, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3HFHZKP4JYMCZPDPUYRXDZCTH2DAVCNFSM6AAAAABGBAHA7CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJVGU4DSMJVGU . You are receiving this because you authored the thread.Message ID: @.***>

ericjbohm commented 3 months ago

closing this, will open an issue regarding some oddities with AMPI virtualiation