JeffersonLab / qphix

QCD for Intel Xeon Phi and Xeon processors
http://jeffersonlab.github.io/qphix/
Other
13 stars 11 forks source link

Build issues with QPhix-Codegen #101

Open diptorupd opened 6 years ago

diptorupd commented 6 years ago

Building the codegen within the new infrastructure caused two problems. I am opening this to note workarounds in case some else faces similar issues. Feel free to close as off topic if needed.

0) Python3 and jinja requirements make it hard (near impossible) to build this library on servers where one cannot yum/apt-get these.

1) To get round this issue, I tried building the code on my laptop. But, gcc 6.3 could not generate the static library.

/home/diptorup/qphix/codegen/build/generated/avx2/src/dslash_avx2_spec_double_4_2_compress12.cpp:14: /usr/lib/gcc/x86_64-linux-gnu/6/include/avxintrin.h:1278:1: error: inlining failed in call to always_inline ‘__m256d _mm256_set1_pd(double)’: target specific option mismatch _mm256_set1_pd (double __A)

Apparently icc is needed or at least this version of gcc fails. Workaround - moved "generated" directory to server that had icc and recompiled.

2) avx2 headers for float_8_8 are not generated by the codegen, so the .cpp files fail to compile when looking for these files. I did not investigate, may be some simple missing flag in a CMakeList. [ 13%] Building CXX object CMakeFiles/qphix_codegen.dir/src/dslash_avx2_spec_float_8_8_compress12_tsts.cpp.o /home/diptorup/shared/10_Adelie_QUARC_lqcd_testbed/QPhiX-QDPXX/Code-generated/generated-avx2/avx2/src/dslash_avx2_spec_float_8_8_compress12_tttt.cpp(66): catastrophic error: cannot open source file "generated/dslash_plus_body_float_float_v8_s8_12_tttt"

Workaround - I just went and removed these from the CMakeList in the "generated" directory created by jinja.

kostrzewa commented 6 years ago

Which branch are you trying to build?

Pyhton3 should always be available either directly or through an LMOD module. If it isn't you should probably pester the admins as I would consider it essential software.

As for jinja2, if you can ssh into the machine, you can upload the tarball or a whole clone of the git repository. You can install the library with the --user flag so it will land in your home directory.

A problem that I'm aware of -- and I'm not sure if we've fixed that -- is that CMake checks for the python libraries, which are not required for the generator to do its job as far as I can tell.

I've never tested gcc 6.3 but 5.4.0 works. The resulting kernels are a little slower than ICC-compiled ones.

Float 8 8 should be generated, at least they always are for me for the AVX2 target. Depending on what you want to do, these are the most important kernel specialisations, so skipping them is likely not a good idea.

bjoo commented 6 years ago

Hi All, The jlab nodes are behind a web firewall, so I had to install Python from source and manually download and install Jinja, but it all worked. We can set python oriented variables in CMake to point to our own versions... I do this at NERSC where anaconda is being used. I believe I have built with gcc-6.3, at least for KNL with no difficulties apart from the -Drestrict=restrict option.

Dipto, just to be concrete the qphix-codegen Git repo is sort of orphaned now that we combined it into qphix itself. For the latest version you should please check our qphix 'devel' branch and the codegen should be included.

Best, B

On September 3, 2017 3:12:35 PM EDT, Bartosz Kostrzewa notifications@github.com wrote:

Which branch are you trying to build?

Pyhton3 should always be available either directly or through an LMOD module. If it isn't you should probably pester the admins as I would consider it essential software.

As for jinja2, if you can ssh into the machine, you can upload the tarball or a whole clone of the git repository. You can install the library with the --user flag so it will land in your home directory.

What is a problem, and I'm not sure if we've fixed that, is that CMake checks for the python libraries, which are not required for the generator to do its job.

I've never tested gcc 6.3 but 5.4.0 works. The resulting kernels are a little slower than ICC-compiled ones.

Float 8 8 should be generated, at least they always are for me for the AVX2 target. Depending on what you want to do, these are the most important kernel specialisations, so skipping them is likely not a good idea.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_JeffersonLab_qphix_issues_101-23issuecomment-2D326825095&d=DwICaQ&c=lz9TcOasaINaaC3U7FbMev2lsutwpI4--09aP8Lu18s&r=SC-qvz5njMoFH6cliT5XZQ&m=UsWpuXHKR5-ZlflxWmFmU5d3olx3aHeIoXJmGD36VqE&s=41aKnlPB4Ze8gqvSiKXB90i5yw1pqqkisUpvUrl2PeI&e=

-- Balint Joó, Scientific Computing Group, Jefferson Lab Email: bjoo@jlab.org Tel: +1 757 269 5339 Sent form my mobile phone

diptorupd commented 6 years ago

On 09/03/2017 05:07 PM, Balint Joo wrote:

Hi All, The jlab nodes are behind a web firewall, so I had to install Python from source and manually download and install Jinja, but it all worked. We can set python oriented variables in CMake to point to our own versions... I do this at NERSC where anaconda is being used. I believe I have built with gcc-6.3, at least for KNL with no difficulties apart from the -Drestrict=restrict option.

Yes, I do the same on our Renci servers. I would follow Bartoz's steps to see if I can get Jinja setup.

Dipto, just to be concrete the qphix-codegen Git repo is sort of orphaned now that we combined it into qphix itself. For the latest version you should please check our qphix 'devel' branch and the codegen should be included.

Yes, that is what I did.

Best, B

bjoo commented 6 years ago

A comment, after talking with @diptorupd today, it seems the gcc-6.3 issue pertains to AVX2 building specifically. I have built with gcc-6.3 mostly for KNL, and Travis builds AVX rather than AVX2. I will try to repro this tomorrow.

martin-ueding commented 6 years ago

It is curious that you have problems with GCC 6.3, I have used that on my laptop just fine for AVX. So perhaps it is just an architecture flags that is missing? On Travis CI it was building with some GCC 6 as well for AVX. Also on Hazel Hen (Haswell), it works with a relatively recent GCC, though the performance is not great.

So perhaps the issue is that one has to supply -march=haswell?

Regarding Python: If your HPC frontend allows outgoing connections, you can do pip3 install --user jinja. That will pull in the library and install it into ~/.local/lib.