ExaWorks / SDK

ExaWorks SDK
11 stars 12 forks source link

Error in Swift-T spack package installation #174

Open mtitov opened 1 year ago

mtitov commented 1 year ago

Looks like after ant package got updated, stc package keeps failing

 -   3doajvy      ^stc@0.9.0%gcc@8.1.0 build_system=autotools arch=linux-rhel7-x86_64
 -   3yim7ff          ^ant@1.10.13%gcc@8.1.0 build_system=generic arch=linux-rhel7-x86_64
==> Installing ant-1.10.13-3yim7ffbntseialqfppdenobmusx5qcg
==> No binary for ant-1.10.13-3yim7ffbntseialqfppdenobmusx5qcg found: installing from source
==> Using cached archive: /usr/WS1/exaworks/sdk/spack/var/spack/cache/_source-cache/archive/da/da006f4c888d41d0f3f213565e48aeff73e4d8a6196e494121d8da1e567a8406.tar.gz
==> No patches needed for ant
==> ant: Executing phase: 'install'
==> ant: Successfully installed ant-1.10.13-3yim7ffbntseialqfppdenobmusx5qcg
  Stage: 0.33s.  Install: 28.49s.  Total: 28.91s
[+] /usr/WS1/exaworks/sdk/spack/opt/spack/linux-rhel7-x86_64/gcc-8.1.0/ant-1.10.13-3yim7ffbntseialqfppdenobmusx5qcg
==> Installing stc-0.9.0-3doajvymb5xrdot4jjfvtphbdvoo3o6v
==> No binary for stc-0.9.0-3doajvymb5xrdot4jjfvtphbdvoo3o6v found: installing from source
==> Using cached archive: /usr/WS1/exaworks/sdk/spack/var/spack/cache/_source-cache/archive/ed/edf187344ce860476473ab6599f042cd22ed029aa186d512135990accb9d260f.tar.gz
==> No patches needed for stc
==> stc: Executing phase: 'autoreconf'
==> stc: Executing phase: 'configure'
==> stc: Executing phase: 'build'
==> stc: Executing phase: 'install'
==> Error: ProcessError: Command exited with status 2:
    'make' '-j16' 'install'

2 errors found in build log:
     17    /usr/WS1/exaworks/sdk/spack/opt/spack/linux-rhel7-x86_64/gcc-8.1.0/a
           nt-1.10.13-3yim7ffbntseialqfppdenobmusx5qcg/bin/ant
     18    ant -Ddist.dir=/usr/WS1/exaworks/sdk/spack/opt/spack/linux-rhel7-x86
           _64/gcc-8.1.0/stc-0.9.0-3doajvymb5xrdot4jjfvtphbdvoo3o6v  \
     19        -Dturbine.home=/usr/WS1/exaworks/sdk/spack/opt/spack/linux-rhel7
           -x86_64/gcc-8.1.0/turbine-1.3.0-on5jonxqucz3433fkp23suqcobswunmo \
     20        -Duse.java=/usr/WS1/exaworks/sdk/spack/opt/spack/linux-rhel7-x86
           _64/gcc-8.1.0/openjdk-11.0.17_8-l2n5t3z7clzpd45vclm4malpssw4jo4e/bin
           /java        \
     21                             \
     22        install
  >> 23    Error occurred during initialization of VM
     24    java.lang.Error: Could not create SecurityManager
     25     at java.lang.System.initPhase3(java.base@11.0.17/System.java:2065)
     26    Caused by: java.lang.ClassNotFoundException: allow
     27     at jdk.internal.loader.BuiltinClassLoader.loadClass(java.base@11.0.
           17/BuiltinClassLoader.java:581)
     28     at jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(java.b
           ase@11.0.17/ClassLoaders.java:178)
     29     at java.lang.ClassLoader.loadClass(java.base@11.0.17/ClassLoader.ja
           va:522)
     30     at java.lang.Class.forName0(java.base@11.0.17/Native Method)
     31     at java.lang.Class.forName(java.base@11.0.17/Class.java:398)
     32     at java.lang.System.initPhase3(java.base@11.0.17/System.java:2050)
     33    
  >> 34    make: *** [install] Error 1

p.s. error appears on Ruby, Quartz, Lassen (LLNL)

hategan commented 1 year ago

I'm going to volunteer my vast java experience here :)

It looks like a problem with the ant build (btw, if you come upon http://www.bigsoft.co.uk/blog/2022/04/13/groovy-could-not-create-securitymanager, ignore it; it works because java.security.SecurityManager is not a system property recognized by the JVM, so it gets quietly ignored).

Ant has a class called "allow", which is in src/main/allow.java which should be in the classpath if ant loads correctly. It's possible that is not detecting its own home, but I doubt it, since we'd see it more often.

Chances are it is using a bad cache in which the build failed and it doesn't really have the proper jars. Can you look in /usr/WS1/exaworks/sdk/spack/opt/spack/linux-rhel7-x86_64/gcc-8.1.0/ant-1.10.13-3yim7ffbntseialqfppdenobmusx5qcg/lib and see if there is an ant-launcher.jar? If so, does strings ant-launcher.jar | grep allow.class print anything?

A second possibility is that it is detecting things correctly, but, because of the long prefix, the environment variable used to store the classpath is too large and gets truncated silently. To check, edit /usr/WS1/exaworks/sdk/spack/opt/spack/linux-rhel7-x86_64/gcc-8.1.0/ant-1.10.13-3yim7ffbntseialqfppdenobmusx5qcg/bin/ant and right before the last line (eval "$ant_exec_command $ant_exec_args") add echo "$ant_exec_command $ant_exec_args". Then run /usr/WS1/exaworks/sdk/spack/opt/spack/linux-rhel7-x86_64/gcc-8.1.0/ant-1.10.13-3yim7ffbntseialqfppdenobmusx5qcg/bin/ant and see if the -classpath argument looks suspiciously trimmed.

mtitov commented 1 year ago

Just checked this on Quartz (LLNL)

[titov1@quartz1532:sdk]$ strings /usr/WS1/exaworks/sdk/spack/opt/spack/linux-rhel7-x86_64/gcc-8.1.0/ant-1.10.13-3yim7ffbntseialqfppdenobmusx5qcg/lib/ant-launcher.jar  | grep allow.class
allow.class
allow.classPK

[titov1@quartz1532:sdk]$ /usr/WS1/exaworks/sdk/spack/opt/spack/linux-rhel7-x86_64/gcc-8.1.0/ant-1.10.13-3yim7ffbntseialqfppdenobmusx5qcg/bin/ant
exec "$JAVACMD"  -classpath "$LOCALCLASSPATH" -Dant.home="$ANT_HOME" -Dant.library.dir="$ANT_LIB"  -Djava.security.manager=allow org.apache.tools.ant.launch.Launcher  -cp "$CLASSPATH" 
Error occurred during initialization of VM
java.lang.Error: Could not create SecurityManager
    at java.lang.System.initPhase3(java.base@11.0.17/System.java:2065)
Caused by: java.lang.ClassNotFoundException: allow
    at jdk.internal.loader.BuiltinClassLoader.loadClass(java.base@11.0.17/BuiltinClassLoader.java:581)
    at jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(java.base@11.0.17/ClassLoaders.java:178)
    at java.lang.ClassLoader.loadClass(java.base@11.0.17/ClassLoader.java:522)
    at java.lang.Class.forName0(java.base@11.0.17/Native Method)
    at java.lang.Class.forName(java.base@11.0.17/Class.java:398)
    at java.lang.System.initPhase3(java.base@11.0.17/System.java:2050)
hategan commented 1 year ago

Ok, so the build is fine and it should load the class. Which leaves the command as broken somehow. Unfortunately, I didn't see that it quotes the variables for eval, so maybe change "echo ..." to "eval echo ..." so we can see what the command is after expanding $LOCALCLASSPATH and $CLASSPATH.

Alternatively, copy the ant directory to something that has a shorter path (e.g., /usr/WS1/exaworks/ant) and see if a basic invocation of ant works from there (like ant --help). It should but it should fail in the original directory.

hategan commented 1 year ago

By the way, ant 1.5 had only 4 jars in lib, so the classpath was a few times shorter. Looks like they split the jars up starting with 1.6, so if this was working with 1.5, it's further indication that this is the issue.

mtitov commented 1 year ago

@hategan right

exec /usr/WS1/exaworks/sdk/spack/opt/spack/linux-rhel7-x86_64/gcc-8.1.0/openjdk-11.0.17_8-l2n5t3z7clzpd45vclm4malpssw4jo4e/bin/java 
-classpath /usr/share/java/ant.jar:/usr/share/java/ant-launcher.jar:/usr/share/java/jaxp_parser_impl.jar:/usr/share/java/xml-commons-apis.jar:/usr/share/java/antlr.jar:/usr/share/java/ant/ant-antlr.jar:/usr/share/java/bcel.jar:/usr/share/java/ant/ant-apache-bcel.jar:/usr/share/java/bsf.jar:/usr/share/java/ant/ant-apache-bsf.jar:/usr/share/java/log4j.jar:/usr/share/java/ant/ant-apache-log4j.jar:/usr/share/java/oro.jar:/usr/share/java/ant/ant-apache-oro.jar:/usr/share/java/regexp.jar:/usr/share/java/ant/ant-apache-regexp.jar:/usr/share/java/xml-commons-resolver.jar:/usr/share/java/ant/ant-apache-resolver.jar:/usr/share/java/apache-commons-logging.jar:/usr/share/java/ant/ant-commons-logging.jar:/usr/share/java/apache-commons-net.jar:/usr/share/java/ant/ant-commons-net.jar:/usr/share/java/javamail/mail.jar:/usr/share/java/javamail/javax.mail.jar:/usr/share/java/javamail/gimap.jar:/usr/share/java/javamail/javax.mail-api.jar:/usr/share/java/javamail/imap.jar:/usr/share/java/javamail/mailapi.jar:/usr/share/java/javamail/smtp.jar:/usr/share/java/javamail/pop3.jar:/usr/share/java/javamail/dsn.jar:/usr/share/java/ant/ant-javamail.jar:/usr/share/java/jdepend.jar:/usr/share/java/ant/ant-jdepend.jar:/usr/share/java/jsch.jar:/usr/share/java/ant/ant-jsch.jar:/usr/share/java/junit.jar:/usr/share/java/ant/ant-junit.jar:/usr/share/java/junit.jar:/usr/share/java/ant/ant-junit4.jar:/usr/share/java/ant/ant-swing.jar 
-Dant.home=/usr/share/ant 
-Dant.library.dir=/usr/share/ant/lib 
-Djava.security.manager=allow org.apache.tools.ant.launch.Launcher -cp 
hategan commented 1 year ago

Ha! Looks fine.

Wait, I'm confused. That looks like a different installation of ant.

mtitov commented 1 year ago

Wait, I'm confused. That looks like a different installation of ant.

that is after ant is loaded

[titov1@quartz1532:sdk]$ . spack/share/spack/setup-env.sh
[titov1@quartz1532:sdk]$ spack env activate rhel7-broadwell
[titov1@quartz1532:sdk]$ spack load ant
[titov1@quartz1532:sdk]$ ant
Error occurred during initialization of VM
java.lang.Error: Could not create SecurityManager
    at java.lang.System.initPhase3(java.base@11.0.17/System.java:2065)
Caused by: java.lang.ClassNotFoundException: allow
    at jdk.internal.loader.BuiltinClassLoader.loadClass(java.base@11.0.17/BuiltinClassLoader.java:581)
    at jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(java.base@11.0.17/ClassLoaders.java:178)
    at java.lang.ClassLoader.loadClass(java.base@11.0.17/ClassLoader.java:522)
    at java.lang.Class.forName0(java.base@11.0.17/Native Method)
    at java.lang.Class.forName(java.base@11.0.17/Class.java:398)
    at java.lang.System.initPhase3(java.base@11.0.17/System.java:2050)

I'll start setup from scratch - will remove a corresponding spack directory and will restart CI

hategan commented 1 year ago

Ah, I see. I'm still curious why the spack package is doing that and it doesn't look like my theory above.

hategan commented 1 year ago

spack install stc "works on my laptop"

mtitov commented 1 year ago

I'll start setup from scratch - will remove a corresponding spack directory and will restart CI

this didn't help... (Removed spack and ~/.spack dirs, and let LLNL CI to create spack, envs per machine) I've submitted ticket to LLNL support