Exawind / nalu-wind

Solver for wind farm simulations targeting exascale computational platforms
https://exawind.github.io/nalu-wind/
Other
125 stars 85 forks source link

Code in STK balance causes Intel compiler to segfault during compilation #885

Closed jrood-nrel closed 3 years ago

jrood-nrel commented 3 years ago

Actually I think mine happens in the STK unit tests while @psakievich sees it in STK balance.

  >> 1717    ": internal error: ** The compiler has encountered an unexpected problem.
     1718    ** Segmentation violation signal raised. **
     1719    Access violation or stack overflow. Please contact Intel Support for assistance.
     1720    
  >> 1721    icpc: error #10014: problem during multi-file optimization compilation (code 4)
  >> 1722    make[2]: *** [packages/stk/stk_unit_test_utils/stk_unit_test_utils/libstk_unit_main.so.13.1] Error 4
     1723    make[2]: Leaving directory `/projects/ecp/exawind/nalu-wind-testing/spack/var/spack/stage/jrood/spack-stage-trilinos-develop-a64nlwlq4n6ynynplytmcsfvuaso3w5r/spack-build-a64nlwl'
  >> 1724    make[1]: *** [packages/stk/stk_unit_test_utils/stk_unit_test_utils/CMakeFiles/stk_unit_main.dir/all] Error 2
     1725    make[1]: *** Waiting for unfinished jobs....
sayerhs commented 3 years ago

@jrood-nrel this is happening in the STK balance executable creation. I have had to disable ipo in spack builds when building on Eagle (https://github.com/Exawind/exawind-builder/blob/244adbf1f6feb00cdf49135bfe17ba591c4de612/etc/repos/exawind/packages/trilinos/package.py#L39-L43). The issue doesn't appear if you build manually outside of spack.

Anyway, the point of commenting on this issue is to point out that this is not a new issue and has been since at least January, 2021.

tasmith4 commented 3 years ago

@sayerhs thanks -- this would explain why we haven't seen this in Sierra builds.

@jrood-nrel @psakievich maybe the solution is just to edit the trilinos and/or nalu-wind spack packages?

sayerhs commented 3 years ago

back in Jan 2021 when I was hit by this issue, it appeared to be a result of the really long pathnames that spack uses. I tried to work around it reducing the pathnames for default spack installs (https://github.com/Exawind/exawind-builder/blob/main/etc/spack/spack/config.yaml#L1-L6) but it didn't seem to be as robust as the -no-ipo fix. You will also notice in the trilinos/package.py that I tried to play around with RPATH that sometimes worked but was not repeatable.

jrood-nrel commented 3 years ago

Thanks @sayerhs . CMAKE_INTERPROCEDURAL_OPTIMIZATION is explicitly off, but I will try forcing it off using your method.

sayerhs commented 3 years ago

@jrood-nrel One of the things I could not confirm reliably at that time is whether intel attempts to do any IPO stuff if there is no explicit -ipo[n] in the command line. So I decided to be safe and just do -no-ipo. I also see that my settings also had some RPATH stuff. At this point I am forgetting if I needed both. This might be more of an issue/question for spack and/or intel devs.

psakievich commented 3 years ago

I think we can close this now. We've updated the packages to include the --no-ipo flag as suggested by @sayerhs. I haven't experienced any build issues related to this since.