LLNL / axom

CS infrastructure components for HPC applications
BSD 3-Clause "New" or "Revised" License
157 stars 27 forks source link

NVCC error - lgenfe output #260

Open steveg21 opened 4 years ago

steveg21 commented 4 years ago

With CUDA version 10.1.X, the file below gives

Error: Internal Compiler Error (codegen): there was an error in verifying the lgenfe output!

when compiled on BlueOS with a Clang host compiler. It simply creates an instance of mint::UnstructuredMesh. nvcc_error_repro.cpp.txt

This error does not occur with CUDA 10.2.86.

gzagaris commented 4 years ago

Thanks @steveg21 -- could you also specify which version of the Clang host compiler you are using on BlueOS with cuda-10.1.X and cuda-10.2.86? Are you using the same host compiler or different?

I suspect this is primarily an issue with the host-compiler since there is no GPU code in the reproducer example.

steveg21 commented 4 years ago

nvcc_error_repro_lassen.sh.txt Here's a script to compile the reproducer. You'll need to modify paths for your Axom and Umpire installations. The error can be toggled on/off by changing CUDA_ROOT on first line. I'm using clang 9.0.0.

rhornung67 commented 4 years ago

Does it work if you instead use /usr/tce/packages/clang/clang-9.0.0/bin/clang++ as the host compiler? --Rich

From: steveg21 notifications@github.com Reply-To: LLNL/axom reply@reply.github.com Date: Wednesday, June 17, 2020 at 2:02 PM To: LLNL/axom axom@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [LLNL/axom] NVCC error - lgenfe output (#260)

nvcc_error_repro_lassen.sh.txthttps://github.com/LLNL/axom/files/4795099/nvcc_error_repro_lassen.sh.txt Here's a script to compile the reproducer. You'll need to modify paths for your Axom and Umpire installations. The error can be toggled on/off by changing CUDA_ROOT on first line. I'm using clang 9.0.0.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/LLNL/axom/issues/260#issuecomment-645622670, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADBVDHGKNTAVTWK4FBCY6ULRXEVM7ANCNFSM4OA5FH6Q.

steveg21 commented 4 years ago

The error can be reproduced with the following host compilers on Lassen: /usr/tce/packages/clang/clang-ibm-2019.10.03/bin/clang++ /usr/tce/packages/clang/clang-9.0.0/bin/clang++ /usr/tce/packages/xl/xl-2020.03.18/bin/xlc

gzagaris commented 4 years ago

@steveg21 -- interesting.

Is the IBM xlc compiler using the Clang frontend? The compiler segfault that I saw was coming from the frontend compiler and I am wondering if these compilers are all using the same Clang frontend.

steveg21 commented 4 years ago

Add gcc 8.3.1 to the above list of host compilers. /usr/tce/packages/gcc/gcc-8.3.1/bin/c++

gzagaris commented 4 years ago

Thanks for all the info @steveg21 -- this rules out the Clang front end as the underlying culprit for this. I'll try to investigate this further.

gzagaris commented 4 years ago

Hi @steveg21 -- I think I have a candidate solution for this.

At your convenience, could you try the fix in PR #267 and confirm whether or not it fixes the issue that you are seeing?