LBL-EESA / TECA

TECA, theToolkit for Extreme Climate Analysis, contains a collection of climate anlysis algorithms targetted at extreme event detection and analysis.
Other
57 stars 21 forks source link

Setup travis ci for the superbuild #328

Open burlen opened 4 years ago

burlen commented 4 years ago

the goal is to have testing for the superbuild similar to what we do for teca itself. This would be accomplished in a similar way, ex: minimal docker image, run the superbuild, send result to a cdash site. also apple mac os runs.

It is important because the superbuild continues to be a convenient way to install teca on specialized systems such as cori.

elbashandy commented 4 years ago

@burlen should we have this as an independent travis-ci test for the TECA_superbuild repo? or add it as an additional test in the TECA repo?

burlen commented 4 years ago

TECA_superbuild repo

elbashandy commented 4 years ago

@taobrienlbl can you authorize Travis-CI to integrate TECA_superbuild?

burlen commented 4 years ago

please use a docker based on the latest available Ubuntu release for this.

elbashandy commented 4 years ago

These are the extra libraries I needed to install for the superbuild to install (on fedora):

expat-devel (for udunit) libffi-devel (for Python) pcre-devel & zlib-devel (for swig. The included zlib install was not enough for swig, zlib-devel was needed) libtool (for mpi)

@burlen Should I add these libs to the superbuild?

burlen commented 4 years ago

I think not for libtool - this part of GNU OS, and better down via package managers for compatibility w/ other critical os level dependencies (ie compilers).

However, expat, libffi, zlib, pcre all seem reasonable additions to me.

burlen commented 4 years ago

see also #256

elbashandy commented 4 years ago

I had to install:

There's a problem with SWIG not populating LDFLAGS in TECA_superbuild/build/SWIG-prefix/src/SWIG-build/CCache/Makefile :

Makefile:

CC=/usr/bin/cc
CFLAGS=-I/app/TECA_superbuild/build/include  -O3 -march=native -mtune=native -DNDEBUG -Wall -W -I.
SWIG=swig
SWIG_LIB=../$(srcdir)/../Lib
EXEEXT=

LIBS= -lz
OBJS= ccache.o mdfour.o hash.o execute.o util.o args.o stats.o \
        cleanup.o snprintf.o unify.o
HEADERS = ccache.h mdfour.h config.h config_win32.h

...

$(PACKAGE_NAME)$(EXEEXT): $(OBJS) $(HEADERS)
        $(CC) $(CFLAGS) $(LDFLAGS) -o $@ $(OBJS) $(LIBS)

SWIG-build-err:

/usr/bin/ld: cannot find -lz

installing zlibg1-dev (ubuntu) or zlib-devel (fedora) via package-managers fixed the problem.

burlen commented 4 years ago

Please install m4 using the package manager, this is p[art of the build system and should not be included in the superbuild

burlen commented 4 years ago

Please ping me w/ a branch name when you have this pushed

burlen commented 4 years ago

zlib is already being installed by the superbuild and has been for some time. Perhaps SWIG needs a newer version of zlib. Either way we'll need to update all of the dependencies to the latest versions. Would you please do this?

elbashandy commented 4 years ago

I think we are using the latest version of zlib 1.2.11 released on January 15, 2017

burlen commented 4 years ago

Please update all the dependencies, not just zlib, to the newest version.

elbashandy commented 4 years ago

Sounds good

burlen commented 4 years ago

swig depends on pcre not pcre2

elbashandy commented 4 years ago

Okay will fix it

burlen commented 4 years ago

when you add a new package to the build make sure you print a status message when both enabled & disabled

elbashandy commented 4 years ago

Oh I only added the enabled message. Will add the disabled as well.

elbashandy commented 4 years ago

test_binary_stream_mpi is failing because it's the only test that has ${MPIEXEC} -n 2 hard-coded

After investigating the available resources on Travis-CI, I found out that it has only one core with 2 hyperthreads. That's why it's failing as OpenMPI assigns a slot per core (1 slot). To allow hyperthreading we can use mpirun --use-hwthread-cpus ...

lscpu output

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          2
On-line CPU(s) list:             0,1
Thread(s) per core:              2
Core(s) per socket:              1
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           85
Model name:                      Intel(R) Xeon(R) CPU
Stepping:                        7
CPU MHz:                         2800.184
BogoMIPS:                        5600.36
Hypervisor vendor:               KVM
Virtualization type:             full
L1d cache:                       32 KiB
L1i cache:                       32 KiB
L2 cache:                        1 MiB
L3 cache:                        33 MiB
NUMA node0 CPU(s):               0,1
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT Host state unknown
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Enhanced IBRS, IBPB conditional, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Mitigation; Clear CPU buffers; SMT Host state unknown
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat avx512_vnni md_clear arch_capabilities

Error:

44/144 Test  #44: test_binary_stream_mpi ...........................***Failed    0.02 sec
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2
slots that were requested by the application:
  test_binary_stream
Either request fewer slots for your application, or make more slots
available for use.
A "slot" is the Open MPI term for an allocatable unit where we can
launch a process.  The number of slots available are defined by the
environment in which Open MPI processes are run:
  1. Hostfile, via "slots=N" clauses (N defaults to number of
     processor cores if not provided)
  2. The --host command line parameter, via a ":N" suffix on the
     hostname (N defaults to 1 if not provided)
  3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
  4. If none of a hostfile, the --host command line parameter, or an
     RM is present, Open MPI defaults to the number of processor cores
In all the above cases, if you want Open MPI to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.
Alternatively, you can use the --oversubscribe option to ignore the
number of available slots when deciding the number of processes to
launch.
--------------------------------------------------------------------------

I think the fix that makes sense is to change ${MPIEXEC} -n 2 to ${MPIEXEC} -n ${TEST_CORES}

burlen commented 3 years ago

I've setup github actions on the superbuild repo on ubuntu 20.04. This took about 4 hours of work. Github actions seems to have limited capability to Travis CI. Travis CI would still be useful, especially for testing on Mac OS.