Open steveg21 opened 4 years ago
Thanks @steveg21 -- could you also specify which version of the Clang host compiler you are using on BlueOS with cuda-10.1.X and cuda-10.2.86? Are you using the same host compiler or different?
I suspect this is primarily an issue with the host-compiler since there is no GPU code in the reproducer example.
nvcc_error_repro_lassen.sh.txt Here's a script to compile the reproducer. You'll need to modify paths for your Axom and Umpire installations. The error can be toggled on/off by changing CUDA_ROOT on first line. I'm using clang 9.0.0.
Does it work if you instead use /usr/tce/packages/clang/clang-9.0.0/bin/clang++ as the host compiler? --Rich
From: steveg21 notifications@github.com Reply-To: LLNL/axom reply@reply.github.com Date: Wednesday, June 17, 2020 at 2:02 PM To: LLNL/axom axom@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [LLNL/axom] NVCC error - lgenfe output (#260)
nvcc_error_repro_lassen.sh.txthttps://github.com/LLNL/axom/files/4795099/nvcc_error_repro_lassen.sh.txt Here's a script to compile the reproducer. You'll need to modify paths for your Axom and Umpire installations. The error can be toggled on/off by changing CUDA_ROOT on first line. I'm using clang 9.0.0.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/LLNL/axom/issues/260#issuecomment-645622670, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADBVDHGKNTAVTWK4FBCY6ULRXEVM7ANCNFSM4OA5FH6Q.
The error can be reproduced with the following host compilers on Lassen: /usr/tce/packages/clang/clang-ibm-2019.10.03/bin/clang++ /usr/tce/packages/clang/clang-9.0.0/bin/clang++ /usr/tce/packages/xl/xl-2020.03.18/bin/xlc
@steveg21 -- interesting.
Is the IBM xlc compiler using the Clang frontend? The compiler segfault that I saw was coming from the frontend compiler and I am wondering if these compilers are all using the same Clang frontend.
Add gcc 8.3.1 to the above list of host compilers. /usr/tce/packages/gcc/gcc-8.3.1/bin/c++
Thanks for all the info @steveg21 -- this rules out the Clang front end as the underlying culprit for this. I'll try to investigate this further.
Hi @steveg21 -- I think I have a candidate solution for this.
At your convenience, could you try the fix in PR #267 and confirm whether or not it fixes the issue that you are seeing?
With CUDA version 10.1.X, the file below gives
Error: Internal Compiler Error (codegen): there was an error in verifying the lgenfe output!
when compiled on BlueOS with a Clang host compiler. It simply creates an instance of mint::UnstructuredMesh. nvcc_error_repro.cpp.txt
This error does not occur with CUDA 10.2.86.