charmplusplus / charm

The Charm++ parallel programming system. Visit https://charmplusplus.org/ for more information.
Apache License 2.0
203 stars 49 forks source link

Segfault in examples/fem/simple2D under newer ICC #118

Closed PhilMiller closed 11 years ago

PhilMiller commented 11 years ago

Original issue: https://charm.cs.illinois.edu/redmine/issues/118


The test of examples/fem/simple2D segfaults on Edison during autobuild. The test also segfaults on Hopper when built with ICC.

valgrind reports

1 0.000179 sec for loop 896 
1 0.000184 sec for loop 960 
1 0.000179 sec for loop 1024 
==31359== Invalid read of size 1
==31359==    at 0x5237BB: NetFEM_item::~NetFEM_item() (in /global/u2/p/pmiller/charm/net-linux-x86_64-icc/examples/fem/simple2D/pgm)
==31359==  Address 0x6df1189 is not stack'd, malloc'd or (recently) free'd
==31359== 
==31359== Invalid read of size 1
==31359==    at 0x52382C: NetFEM_item::~NetFEM_item() (in /global/u2/p/pmiller/charm/net-linux-x86_64-icc/examples/fem/simple2D/pgm)
==31359==  Address 0x6df1189 is not stack'd, malloc'd or (recently) free'd
==31359== 
==31359== Conditional jump or move depends on uninitialised value(s)
==31359==    at 0x5237C4: NetFEM_item::~NetFEM_item() (in /global/u2/p/pmiller/charm/net-linux-x86_64-icc/examples/fem/simple2D/pgm)
==31359== 
==31359== Conditional jump or move depends on uninitialised value(s)
==31359==    at 0x523835: NetFEM_item::~NetFEM_item() (in /global/u2/p/pmiller/charm/net-linux-x86_64-icc/examples/fem/simple2D/pgm)
==31359== 
1 0.000177 sec for loop 1088 
1 0.000178 sec for loop 1152 
1 0.000179 sec for loop 1216 
1 0.000178 sec for loop 1280 
1 0.000179 sec for loop 1344 

But the segfault occurs much later, at loop 3072, in non-valgrind execution, and seemingly much earlier on Edison. So, this test or the underlying netfem library need some fixing.

PhilMiller commented 5 years ago

Original date: 2013-03-26 13:25:43


Autobuild showed different, but possibly related crash for this: http://charm.cs.uiuc.edu/autobuild/old.2013_03_26__03_32/mpi-crayxc.txt

PhilMiller commented 5 years ago

Original date: 2013-04-13 23:56:38


I just tried to reproduce this on Stampede with various versions of icc. None of them showed the crash, or related valgrind errors.

phil`stampede$ module avail intel
---------------------------- /opt/apps/modulefiles -----------------------------
   intel/13.0.1.117       intel/13.0.2.146  (D)     intel/13.0.079
PhilMiller commented 5 years ago

Original date: 2013-06-03 22:26:47


The core meeting was unanimous in deciding to drop support for FEM instead of spending time on delving into this. To be done instead: