LLNL / lbann

Livermore Big Artificial Neural Network Toolkit
http://software.llnl.gov/lbann/
Other
224 stars 79 forks source link

Building error on Summit #1187

Open jychoi-hpc opened 5 years ago

jychoi-hpc commented 5 years ago

Hi. I am trying to build LBANN (develop branch) on Summit, ORNL, with spack but getting the following error.

[ 80%] Linking CXX executable lbann_gan
...
/gpfs/alpine/world-shared/csc143/jyc/summit/sw/spack/opt/spack/linux-rhel7-ppc64le/gcc-6.4.0/hydrogen-develop-5mpptcpijdywr2yqe7czg2lqlmikdpgf/lib/libHydrogen_CXX.so: undefined reference to `ncclDataType_t Al::internal::nccl::TypeMap<__half>()'
/usr/bin/ld: link errors found, deleting executable `lbann_gan'
/usr/bin/sha1sum: lbann_gan: No such file or directory
collect2: error: ld returned 1 exit status

I was able to build a few months ago. But, after Summit's software upgrade, I am getting this error now.

I am using the following spack command:

spack install -v lbann +gpu +nccl ^hydrogen@develop 

I believe spack is trying to build 0.99 version of LBANN. I tried with the recent version of LBANN in develop branch but got the same error.

I appreciate any advice in advance.

timmoon10 commented 5 years ago

Hi Jong, this looks like it's a bug in the interaction between Hydrogen and Aluminum. The main developer for Hydrogen is on vacation, and he should be back later this week. In the meantime, what happens when you build with

spack install lbann +gpu +nccl ^hydrogen@develop ^aluminum@master
jychoi-hpc commented 5 years ago

Thank you for the command.

I found a few places in Hydrogen and LBANN searching 0.2.0 version of Aluminum. After changing to use 0.3.1, I was able to compile the lbann develop version with ^hydrogen@develop and ^aluminum@master:

find_package(Aluminum 0.3.1 NO_MODULE QUIET)
ndryden commented 5 years ago

Glad you were able to build.

This is a bit odd though, LBANN should be wanting 0.2.1-1 of Aluminum. (It looks like Hydrogen asks for 0.2.0, which should be compatible.) Perhaps Spack packages have gotten out of sync (@bvanessen)?

bvanessen commented 5 years ago

Let me take a look at this.

Brian C. Van Essen vanessen1@llnl.gov (w) 925-422-9300 (c) 925-290-5470

On Aug 27, 2019, at 11:38 AM, Nikoli Dryden notifications@github.com wrote:

Glad you were able to build.

This is a bit odd though, LBANN should be wanting 0.2.1-1 of Aluminum. (It looks like Hydrogen asks for 0.2.0, which should be compatible.) Perhaps Spack packages have gotten out of sync (@bvanessen)?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.