HISKP-LQCD / chroma-auxiliary-scripts

Scripts for working with USQCD Chroma
1 stars 3 forks source link

Compilation broken on Hazel Hen #2

Closed kostrzewa closed 6 years ago

kostrzewa commented 6 years ago

Tasks:


Unfortunately, the changes to the environment on Hazel Hen make the compilation scripts fail.

I'm afraid I won't really have the time to take care of this over the next few days, it would be great if you could give it a try...

martin-ueding commented 6 years ago

Sure, I'll get that script back on track. While I am at it, I might improve the command line arguments with argbash.

martin-ueding commented 6 years ago

Now the script has a new flag, -d which will just download all the needed files and then stops. This already proved useful with Hazel Hen.

I too face the libxml2 issue. I looked into the config.log file from QDP++ and it seems that the flags are only working with the system-default GCC, but not with the Cray-wrapped compilers (cc and CC). Manual compilation of libxml2 has some other problem, so I have emailed the support:

The system wide installation of libxml2 is not compatible with the Cray or Intel compilers.

Simple example. We have a trivial C program:

$ cat test.c
int main() { return 0; }

Compiling with Cray (default) compiler and the flags that xml2-config --libs gives us fails:

$ cc test.c -lxml2 -L/lib64 -lz -llzma -lm -ldl
/opt/cray/pe/cce/8.6.5/binutils/x86_64/x86_64-pc-linux-gnu/bin/ld: cannot find -lxml2
/opt/cray/pe/cce/8.6.5/binutils/x86_64/x86_64-pc-linux-gnu/bin/ld: cannot find -llzma

And it is no different with the Intel compiler:

$ module swap PrgEnv-cray PrgEnv-intel

$ cc test.c -lxml2 -L/lib64 -lz -llzma -lm -ldl
ld: cannot find -lxml2
ld: cannot find -llzma

Compiling libxml2 from source with the Intel compiler is the next thing that we have tried. Using the flags that work on JURECA in Jülich and Marconi A2 in Bologna, we get this error message:

ld: attempted static link of dynamic object `./.libs/libxml2.so'

How can we proceed to get libxml2 working?

Perhaps I can figure out how to manually compile libxml2. But if I use the system-wide GCC for that, we likely have the same problem as with the CentOS package version. Perhaps I can figure something out.

kostrzewa commented 6 years ago

Have they reacted in some way to your mail?

martin-ueding commented 6 years ago

They have answered another ticket that I wrote at the same time (about missing exit status from the module command). So I would conclude that they answered the simple question directly and their lack of response on this harder question means that they first need to investigate before writing a response.

It seems that we are stuck here. On Stack Overflow it has been suggested that the linkage failure is an indication that one should not use this library with compute code.

I just thought about this: We can just compile libxml2 with the host GCC and then use the Intel compiler for the remainder. Since the Intel compiler claims to be ABI compatible with the system GCC, this should work just fine, right?

It might be that libz and liblzma are then missing, but we could try this.

martin-ueding commented 6 years ago

As feared, exactly this has happened now. I have compiled libxml2 using the system GCC. This worked just fine. But when compiling QDP++, the linkage now fails, not because of libxml2 but because of liblzma:

configure:4442: checking if we can compile/link a simple libxml2 program
configure:4480: /opt/cray/pe/craype/2.5.14/bin/CC -o conftest -xAVX2 -O3 -fopenmp -std=c++11 -I/zhome/academic/HLRS/hsk/xskmuedi/Chroma-2018/local-icc/include/libxml2    conftest.cpp  -L/zhome/academic/HLRS/hsk/xskmuedi/Chroma-2018/local-icc/lib -lxml2 -llzma -lm >&5
ld: cannot find -llzma

So I would conclude that the issue is the Intel linker. It has a problem building the libxml2 test programs and it has a problem linking to a system library.

I have some more ideas, I'll add them as a check list to the first post.

We do want to continue running HMC on Hazel Hen, right? So this needs do be done.

martin-ueding commented 6 years ago

The Intel compiler in version 18.0.1.163, 17.0.6.256, and 16.0.4.258 all have the same linking problem. And Intel compiler 16 does not even know about -xAVX2. Either it does not support it at all, or there is a different name for that option.

Since the Cray wrapped Cray compiler also fails to link to the system libraries, I have the impression that this Cray wrapping messes with the paths in such a way that it just cannot link system libraries.

I will try it with GCC next. The Cray compiler cannot be tried because that does not support half precision and perhaps even some AVX2 instructions. So even if we managed to get this to compile, we would have to use AVX with QPhiX, which will limit our performance, I suppose.

martin-ueding commented 6 years ago

The Cray wrapped GCC (g++ (GCC) 7.2.0 20170814 (Cray Inc.)) exhibits the exact same linkage problem.

I see two more things besides waiting for the HLRS support:

  1. Building liblzma and libz with the Intel compiler and
  2. Trying to build libxml2 without the liblzma and libz dependency. In XML-world there is a need for compressing the verbose XML format, but I have not see Chroma actually use this. So maybe the authors of libxml2 give us a configuration flag.
martin-ueding commented 6 years ago

Apparently removing the liblzma and libz dependencies has worked! The required flags do not contain them any more:

$ ./xml2-config --libs
-L/zhome/academic/HLRS/hsk/xskmuedi/Chroma-2018/local-icc/lib -lxml2 -lm

Also QDP++ has compiled just fine with Intel C++ 17.

Now QPhiX has some issues with Python, but I presume that this can be solved somehow as well.

-- Check for working C compiler: /opt/cray/pe/craype/2.5.14/bin/cc
-- Check for working C compiler: /opt/cray/pe/craype/2.5.14/bin/cc -- broken
CMake Error at /usr/share/cmake/Modules/CMakeTestCCompiler.cmake:61 (message):
  The C compiler "/opt/cray/pe/craype/2.5.14/bin/cc" is not able to compile a
  simple test program.

  It fails with the following output:

   Change Dir: /zhome/academic/HLRS/hsk/xskmuedi/Chroma-2018/build-icc/qphix/CMakeFiles/CMakeTmp

  Run Build Command:"/usr/bin/gmake" "cmTC_0496d/fast"

  /usr/bin/gmake -f CMakeFiles/cmTC_0496d.dir/build.make
  CMakeFiles/cmTC_0496d.dir/build

  gmake[1]: Entering directory
  '/zhome/academic/HLRS/hsk/xskmuedi/Chroma-2018/build-icc/qphix/CMakeFiles/CMakeTmp'

  Building C object CMakeFiles/cmTC_0496d.dir/testCCompiler.c.o

  /opt/cray/pe/craype/2.5.14/bin/cc -o
  CMakeFiles/cmTC_0496d.dir/testCCompiler.c.o -c
  /zhome/academic/HLRS/hsk/xskmuedi/Chroma-2018/build-icc/qphix/CMakeFiles/CMakeTmp/testCCompiler.c

  Linking C executable cmTC_0496d

  /usr/bin/cmake -E cmake_link_script CMakeFiles/cmTC_0496d.dir/link.txt
  --verbose=1

  /opt/cray/pe/craype/2.5.14/bin/cc
  CMakeFiles/cmTC_0496d.dir/testCCompiler.c.o -o cmTC_0496d

  ld: cannot find -l.cpython-34m

  CMakeFiles/cmTC_0496d.dir/build.make:97: recipe for target 'cmTC_0496d'
  failed

  gmake[1]: *** [cmTC_0496d] Error 1

  gmake[1]: Leaving directory
  '/zhome/academic/HLRS/hsk/xskmuedi/Chroma-2018/build-icc/qphix/CMakeFiles/CMakeTmp'

  Makefile:126: recipe for target 'cmTC_0496d/fast' failed

  gmake: *** [cmTC_0496d/fast] Error 2

  CMake will not be able to correctly generate this project.
kostrzewa commented 6 years ago

I will try it with GCC next. The Cray compiler cannot be tried because that does not support half precision and perhaps even some AVX2 instructions. So even if we managed to get this to compile, we would have to use AVX with QPhiX, which will limit our performance, I suppose.

Before I get back to the other points: Just to reiterate. Half precision support is not required (nor recommended) on AVX2, because it doesn't provide any benefit whatsoever. Even on AVX512, half precision is basically no help at all. There is no hardware support for half precision so it provides no benefit, unlike on GPUs... Later generations of Intel processors will likely have native half and quarter precision support.

kostrzewa commented 6 years ago

Apparently removing the liblzma and libz dependencies has worked! The required flags do not contain them any more:

Excellent, what a mess!

kostrzewa commented 6 years ago

All of this might also be tracable to transparent huge pages, but I'm not sure..

kostrzewa commented 6 years ago

From my point of view, the HLRS people need to fix this on their side and provide step by step support to get us up and running. @urbach, would you agree?

urbach commented 6 years ago

From my point of view, the HLRS people need to fix this on their side and provide step by step support to get us up and running. @urbach, would you agree?

definitly! If necessary, I can insist.

-- Carsten Urbach e-mail: curbach@gmx.de urbach@hiskp.uni-bonn.de Fon : +49 (0)228 73 2379 skype : carsten.urbach URL: http://www.carsten-urbach.eu

martin-ueding commented 6 years ago

It compiles now with Intel C++ version 17! Updated script is also in the repository.

I have not tried to run it, but I would hope that it works.

kostrzewa commented 6 years ago

Awesome, could you give it a spin with a few trajectories of the 32c96 run?

kostrzewa commented 6 years ago

How did it run? Can we continue the simulations on Hazel Hen?

martin-ueding commented 6 years ago

I did one update, it took 56 minutes. I will check the log tomorrow.