Closed barracuda156 closed 1 year ago
The very next line uses the configure_file
command to write the configuration to ${CMAKE_CURRENT_BINARY_DIR}/starneig_config.h
. That is, your_build_dir/src/starneig_config.h
should contain either
#define ALIGNED_ALLOC_FOUND
or
/* #define ALIGNED_ALLOC_FOUND */
Since linker is complaining about it, it must be the former. Could you confirm this?
It looks like you are using MacOS. Is this correct?
Assuming CMake has indeed detected aligned_alloc
, something must have gone wrong during the linking phase. Perhaps MacOS and/or CMake handles linking somehow differently when compared to Linux. Unfortunately, I do not have an access to a MacOS machine so I cannot test this myself.
@mirkomyl Thank you for responding!
I will check logs soon, but I can say that:
Could you check PR #2?
I will try building from the master with that patch added in an hour or so, and update you.
@mirkomyl On a side-note, on PowerPC (regardless of OS) -mtune=native
should be used for optimizations, not -march=native
or -mtune=generic
.
https://www.rowleydownload.co.uk/arm/documentation/gnu/gcc/RS_002f6000-and-PowerPC-Options.html
https://github.com/mfem/mfem/issues/216
@mirkomyl On a side-note, on PowerPC (regardless of OS)
-mtune=native
should be used for optimizations, not-march=native
or-mtune=generic
. https://www.rowleydownload.co.uk/arm/documentation/gnu/gcc/RS_002f6000-and-PowerPC-Options.html mfem/mfem#216
In that case a user may disable the STARNEIG_ENABLE_OPTIMIZATION
option [1] as done in the binary packages [2].
In that case a user may disable the
STARNEIG_ENABLE_OPTIMIZATION
option [1] as done in the binary packages [2].
Yes, I know, in fact those flags are not enforced anyway. What I meant is rather add a valid optimization option for PPC. I mean, I can make a PR for that, if this is more convenient, it is an easy fix.
@mirkomyl Update on tests. I have rebuilt v. 0.1.8 with your patch from PR referred. The build succeeds. Tests still fail, but a bit differently:
Start testing: Jan 15 21:02 MYT
----------------------------------------------------------
1/10 Testing: simple-hessenberg
1/10 Test: simple-hessenberg
Command: "/opt/local/var/macports/build/_opt_PPCRosettaPorts_math_StarNEig/StarNEig/work/build/starneig-test" "--experiment" "hessenberg" "--n" "5000" "--solver" "starneig-simple" "--keep-going"
Directory: /opt/local/var/macports/build/_opt_PPCRosettaPorts_math_StarNEig/StarNEig/work/build/test
"simple-hessenberg" start time: Jan 15 21:02 MYT
Output:
----------------------------------------------------------
TEST: --seed 1673787767 --experiment hessenberg --test-workers default --blas-threads default --lapack-threads default --scalapack-threads default --data-format pencil-local --init default --n 5000 --solver starneig-simple --cores default --gpus default --hooks hessenberg:normal residual:normal --residual-fail-threshold 10000 --residual-warn-threshold 500 --repeat 1 --warmup 0 --keep-going
THREADS: Using 0 StarPU worker threads during initialization and validation.
THREADS: Using 0 BLAS threads during initialization and validation.
THREADS: Using 0 BLAS threads in LAPACK solvers.
THREADS: Using 1 BLAS threads in ScaLAPACK solvers.
INIT...
PREPARE...
[starneig][fatal error] Something unexpected happened.
<end of output>
Test time = 1.36 sec
----------------------------------------------------------
Test Failed.
Should I try running via GDB?
P. S. Let me try to rebuild starpu
also, I built it with gcc-4.2
initially, may not be the optimal choice.
Just for the record: build_test_log.txt
It appears that the test fails before the actual solver routine is called (PREPARE...
) so something goes wrong during initialization. Very difficult to say what is happening without seeing a backtrace. What worries me is the the fact that the test program reports it is using zero workers etc, so perhaps this is a hwloc issue. You can see what the output should like look like from the manual [1].
Regarding PPC and MacOS support in general, StarNEig is meant to be used in a Linux environment. I know it does work in Windows (WSL) without CUDA-support but PPC Macs are not within the target group. It would be nice if StarNEig worked in such an environment but I do not consider it a priority.
ADD: You may want to compile StarNEig with the STARNEIG_ENABLE_VERBOSE
option enabled.
@mirkomyl Thank you, I will try enabling verbose.
For PPC, I generally don’t expect anyone to make dedicated fixes, of course, since regardless of interest in that, the hardware is understandably scarce. However it is perhaps a rare instance when a fix needed is genuinely macOS PPC-specific (ABI differs from ELF, but it is usually relevant for assembler or otherwise alignments). As long as we consider C, C++ and Fortran, whatever works for Linux and BSD usually can work for macOS, including PPC versions. Exceptions are graphics- and web-related, when needed features are missing from the SDK (this won’t be arch-specific but rather macOS version-specific).
What worries me is the the fact that the test program reports it is using zero workers etc, so perhaps this is a hwloc issue.
Looking at the code here https://github.com/open-mpi/hwloc/blob/master/hwloc/topology-darwin.c I will not be surprised if it is broken. Cache line sizes look wrong for PPC case etc.
@mirkomyl I tried building against vecLibFort
(interface of Accelerate) instead of OpenBLAS, but got linking error:
Undefined symbols:
"_dgghd3_", referenced from:
_starneig_GEP_SM_HessenbergTriangular in lapack.c.o
_starneig_GEP_SM_HessenbergTriangular in lapack.c.o
ld: symbol(s) not found
collect2: error: ld returned 1 exit status
It looks like vecLibFort
is a some type of lightweight wrapper for BLAS and LAPACK libraries. It is thus somewhat unclear which BLAS and LAPACK libraries you are using (note that OpenBLAS includes both BLAS and LAPACK). The DGGHD3
routine is relatively new addition to LAPACK so perhaps some older LAPACK version do not have it.
@mirkomyl vecLibFort
is an interface to Apple (native) Accelerate. It just enables to use it with Fortran.
What worries me is the the fact that the test program reports it is using zero workers etc, so perhaps this is a hwloc issue.
Looking at the code here https://github.com/open-mpi/hwloc/blob/master/hwloc/topology-darwin.c I will not be surprised if it is broken. Cache line sizes look wrong for PPC case etc.
StarNEig uses hwloc as a ground truth when deciding how many CPU cores to use. Also, some tasks use it for memory allocations. Fully functional hwloc is thus a mandatory requirement.
StarNEig uses hwloc as a ground truth when deciding how many CPU cores to use. Also, some tasks use it for memory allocations. Fully functional hwloc is thus a mandatory requirement.
Thank you, I will take a closer look at it.
For DGGHD3
, is it possible to provide an internal fallback? Generally speaking, Apple own BLAS/LAPACK is more reliable, at least on older macOS.
For
DGGHD3
, is it possible to provide an internal fallback? Generally speaking, Apple own BLAS/LAPACK is more reliable, at least on older macOS.
If this was a more common issues, then perhaps in could be included with the library in the same way the pdgghrd
routine is included. However, StarNEig was developed as a part of a research project that promised to develop state-of-the-art numerical software for modern multi-node multi-core multi-GPU systems. It was thus build using the latest tools and supporting older hardware and software was never a priority. If Apple's LAPACK library is really missing the DGGHD3
routine, then I would simply conclude it is too old.
Well, I guess we could live with OpenBLAS then. Allow me some time to dig into hwloc thing, I will update in a while.
@mirkomyl I had no chance to dig into hwloc
code yet, but it actually passes all tests (10.6.8 Rosetta):
---> Testing hwloc
Executing: cd "/opt/local/var/macports/build/_opt_PPCRosettaPorts_devel_hwloc/hwloc/work/hwloc-2.8.0" && /usr/bin/make check
Making check in include
make[1]: Nothing to be done for `check'.
Making check in hwloc
/usr/bin/make
make[2]: Nothing to be done for `all'.
Making check in utils
Making check in hwloc
Making check in .
/usr/bin/make check-TESTS
PASS: test-hwloc-annotate.sh
PASS: test-hwloc-calc.sh
PASS: test-hwloc-compress-dir.sh
PASS: test-hwloc-diffpatch.sh
PASS: test-hwloc-distrib.sh
PASS: test-hwloc-info.sh
PASS: test-build-custom-topology.sh
PASS: test-parsing-flags.sh
============================================================================
Testsuite summary for hwloc 2.8.0
============================================================================
# TOTAL: 8
# PASS: 8
# SKIP: 0
# XFAIL: 0
# FAIL: 0
# XPASS: 0
# ERROR: 0
============================================================================
Making check in lstopo
/usr/bin/make check-TESTS
PASS: test-lstopo.sh
============================================================================
Testsuite summary for hwloc 2.8.0
============================================================================
# TOTAL: 1
# PASS: 1
# SKIP: 0
# XFAIL: 0
# FAIL: 0
# XPASS: 0
# ERROR: 0
============================================================================
make[2]: Nothing to be done for `check-am'.
Making check in tests
Making check in hwloc
Making check in .
/usr/bin/make hwloc_api_version hwloc_list_components hwloc_bitmap hwloc_bitmap_string hwloc_bitmap_compare_inclusion hwloc_get_closest_objs hwloc_get_obj_covering_cpuset hwloc_get_cache_covering_cpuset hwloc_get_largest_objs_inside_cpuset hwloc_get_next_obj_covering_cpuset hwloc_get_obj_inside_cpuset hwloc_get_shared_cache_covering_obj hwloc_get_obj_below_array_by_type hwloc_get_obj_with_same_locality hwloc_bitmap_first_last_weight hwloc_bitmap_singlify hwloc_type_depth hwloc_type_sscanf hwloc_bind hwloc_get_last_cpu_location hwloc_get_area_memlocation hwloc_object_userdata hwloc_synthetic hwloc_backends hwloc_pci_backend hwloc_is_thissystem hwloc_distances hwloc_groups hwloc_insert_misc hwloc_topology_allow hwloc_topology_restrict hwloc_topology_dup hwloc_topology_diff hwloc_topology_abi hwloc_obj_infos hwloc_iodevs cpuset_nodeset memattrs memtiers cpukinds xmlbuffer gl
CC hwloc_api_version.o
CCLD hwloc_api_version
CC hwloc_list_components.o
CCLD hwloc_list_components
CC hwloc_bitmap.o
CCLD hwloc_bitmap
CC hwloc_bitmap_string.o
CCLD hwloc_bitmap_string
CC hwloc_bitmap_compare_inclusion.o
CCLD hwloc_bitmap_compare_inclusion
CC hwloc_get_closest_objs.o
CCLD hwloc_get_closest_objs
CC hwloc_get_obj_covering_cpuset.o
CCLD hwloc_get_obj_covering_cpuset
CC hwloc_get_cache_covering_cpuset.o
CCLD hwloc_get_cache_covering_cpuset
CC hwloc_get_largest_objs_inside_cpuset.o
CCLD hwloc_get_largest_objs_inside_cpuset
CC hwloc_get_next_obj_covering_cpuset.o
CCLD hwloc_get_next_obj_covering_cpuset
CC hwloc_get_obj_inside_cpuset.o
CCLD hwloc_get_obj_inside_cpuset
CC hwloc_get_shared_cache_covering_obj.o
CCLD hwloc_get_shared_cache_covering_obj
CC hwloc_get_obj_below_array_by_type.o
CCLD hwloc_get_obj_below_array_by_type
CC hwloc_get_obj_with_same_locality.o
CCLD hwloc_get_obj_with_same_locality
CC hwloc_bitmap_first_last_weight.o
CCLD hwloc_bitmap_first_last_weight
CC hwloc_bitmap_singlify.o
CCLD hwloc_bitmap_singlify
CC hwloc_type_depth.o
CCLD hwloc_type_depth
CC hwloc_type_sscanf.o
CCLD hwloc_type_sscanf
CC hwloc_bind.o
CCLD hwloc_bind
CC hwloc_get_last_cpu_location.o
CCLD hwloc_get_last_cpu_location
CC hwloc_get_area_memlocation.o
CCLD hwloc_get_area_memlocation
CC hwloc_object_userdata.o
CCLD hwloc_object_userdata
CC hwloc_synthetic.o
CCLD hwloc_synthetic
CC hwloc_backends.o
CCLD hwloc_backends
CC hwloc_pci_backend.o
CCLD hwloc_pci_backend
CC hwloc_is_thissystem.o
CCLD hwloc_is_thissystem
CC hwloc_distances.o
CCLD hwloc_distances
CC hwloc_groups.o
CCLD hwloc_groups
CC hwloc_insert_misc.o
CCLD hwloc_insert_misc
CC hwloc_topology_allow.o
CCLD hwloc_topology_allow
CC hwloc_topology_restrict.o
CCLD hwloc_topology_restrict
CC hwloc_topology_dup.o
CCLD hwloc_topology_dup
CC hwloc_topology_diff.o
CCLD hwloc_topology_diff
CC hwloc_topology_abi.o
CCLD hwloc_topology_abi
CC hwloc_obj_infos.o
CCLD hwloc_obj_infos
CC hwloc_iodevs.o
CCLD hwloc_iodevs
CC cpuset_nodeset.o
CCLD cpuset_nodeset
CC memattrs.o
CCLD memattrs
CC memtiers.o
CCLD memtiers
CC cpukinds.o
CCLD cpukinds
CC xmlbuffer.o
CCLD xmlbuffer
CC gl.o
CCLD gl
/usr/bin/make check-TESTS
PASS: hwloc_api_version
PASS: hwloc_list_components
PASS: hwloc_bitmap
PASS: hwloc_bitmap_string
PASS: hwloc_bitmap_compare_inclusion
PASS: hwloc_get_closest_objs
PASS: hwloc_get_obj_covering_cpuset
PASS: hwloc_get_cache_covering_cpuset
PASS: hwloc_get_largest_objs_inside_cpuset
PASS: hwloc_get_next_obj_covering_cpuset
PASS: hwloc_get_obj_inside_cpuset
PASS: hwloc_get_shared_cache_covering_obj
PASS: hwloc_get_obj_below_array_by_type
PASS: hwloc_get_obj_with_same_locality
PASS: hwloc_bitmap_first_last_weight
PASS: hwloc_bitmap_singlify
PASS: hwloc_type_depth
PASS: hwloc_type_sscanf
PASS: hwloc_bind
PASS: hwloc_get_last_cpu_location
PASS: hwloc_get_area_memlocation
PASS: hwloc_object_userdata
PASS: hwloc_synthetic
PASS: hwloc_backends
PASS: hwloc_pci_backend
PASS: hwloc_is_thissystem
PASS: hwloc_distances
PASS: hwloc_groups
PASS: hwloc_insert_misc
PASS: hwloc_topology_allow
PASS: hwloc_topology_restrict
PASS: hwloc_topology_dup
PASS: hwloc_topology_diff
PASS: hwloc_topology_abi
PASS: hwloc_obj_infos
PASS: hwloc_iodevs
PASS: cpuset_nodeset
PASS: memattrs
PASS: memtiers
PASS: cpukinds
PASS: xmlbuffer
PASS: gl
============================================================================
Testsuite summary for hwloc 2.8.0
============================================================================
# TOTAL: 42
# PASS: 42
# SKIP: 0
# XFAIL: 0
# FAIL: 0
# XPASS: 0
# ERROR: 0
============================================================================
Making check in ports
/usr/bin/make \
make[4]: Nothing to be done for `all'.
Making check in xml
/usr/bin/make check-TESTS
PASS: 8intel64-4n2t-memattrs.xml
PASS: 16amd64-8n2c-cpusets.xml
PASS: 16amd64-4distances.xml
PASS: 16amd64-4distances.console.output
PASS: 16em64t-4s2c2t.xml
PASS: 16em64t-4s2c2t-offlines.xml
PASS: 16em64t-4s2c2t.console.output
PASS: 16-2gr2gr2n2c+misc.xml
PASS: 16-2gr2gr2n2c+misc.console.output
PASS: 16intel64-manyVFs.xml
PASS: 16intel64-manyVFs.console.output
PASS: 16intel64-manyVFs.console.nocollapse.output
PASS: 24em64t-2n6c2t-pci.xml
PASS: 32em64t-2n8c2t-pci-noio.xml
PASS: 32em64t-2n8c2t-pci-normalio.xml
PASS: 32em64t-2n8c2t-pci-wholeio.xml
PASS: 64intel64-3g2n+2n-irregulargroups+pci.xml
PASS: 64intel64-3g2n+2n-irregulargroups+pci.console.output
PASS: 8intel64-fakeKNL-A2A-hybrid.rootattachednumas.xml
PASS: 64intel64-fakeKNL-SNC4-hybrid.xml
PASS: 96em64t-4n4d3ca2co-pci.xml
PASS: 192em64t-12gr2n8c2t.xml
PASS: 192em64t-24n8c2t.xml
PASS: power8gpudistances.xml
PASS: fakeheterodistances.xml
PASS: fakecpukinds.xml
PASS: 8em64t-2p2ca2co-nonodesets.v1tov2.xml
PASS: 8ia64-2n2s2c+1n.v1tov2.xml
PASS: 16amd64-4distances.v1tov2.xml
PASS: 16amd64-4distances.v2tov1.xml
PASS: 2intel64-1n2c-numaroot.v1tov2.xml
PASS: 28intel64-2p2g7c-CoDgroups.v1tov2.xml
PASS: 28intel64-2p2g7c-CoD.nogroups.v1tov2.xml
PASS: 8intel64-fakeKNL-A2A-hybrid.rootattachednumas.v1tov2.xml
PASS: 8intel64-fakeKNL-A2A-hybrid.rootattachednumas.v2tov1.xml
PASS: 64intel64-fakeKNL-SNC4-hybrid.v1tov2.xml
PASS: 64intel64-fakeKNL-SNC4-hybrid.v2tov1.xml
============================================================================
Testsuite summary for hwloc 2.8.0
============================================================================
# TOTAL: 37
# PASS: 37
# SKIP: 0
# XFAIL: 0
# FAIL: 0
# XPASS: 0
# ERROR: 0
============================================================================
make[2]: Nothing to be done for `check-am'.
Making check in contrib/systemd
make[1]: Nothing to be done for `check'.
Making check in contrib/completion
make[1]: Nothing to be done for `check'.
Making check in contrib/misc
/usr/bin/make hwloc-tweak-osindex
CC hwloc-tweak-osindex.o
CCLD hwloc-tweak-osindex
Making check in contrib/hwloc-ps.www
make[1]: Nothing to be done for `check'.
Making check in doc
/usr/bin/make check-recursive
Making check in examples
/usr/bin/make hwloc-hello hwloc-hello-cpp cpuset+bitmap+cpubind nodeset+membind+policy get-knl-modes gpu sharedcaches
CC hwloc-hello.o
CCLD hwloc-hello
CXX hwloc-hello-cpp.o
CXXLD hwloc-hello-cpp
CC cpuset+bitmap+cpubind.o
CCLD cpuset+bitmap+cpubind
CC nodeset+membind+policy.o
CCLD nodeset+membind+policy
CC get-knl-modes.o
CCLD get-knl-modes
CC gpu.o
CCLD gpu
CC sharedcaches.o
CCLD sharedcaches
/usr/bin/make check-TESTS
PASS: hwloc-hello
PASS: hwloc-hello-cpp
============================================================================
Testsuite summary for hwloc 2.8.0
============================================================================
# TOTAL: 2
# PASS: 2
# SKIP: 0
# XFAIL: 0
# FAIL: 0
# XPASS: 0
# ERROR: 0
============================================================================
make[3]: Nothing to be done for `check-am'.
make[1]: Nothing to be done for `check-am'.
@mirkomyl By the way, there is one more related bug: src/mpi/distr_matrix.c
includes malloc.h
unconditionally, but it is Linux-specific header. At minimum, it should not be included on macOS.
Unless this begins to cause issues on Linux, fixing this is not a priority.
Unless this begins to cause issues on Linux, fixing this is not a priority.
Well, wrong include is trivially fixed, but unless tests are fixed, no real point.
CMakeLists check for presence of
aligned_alloc
, but then nothing is done if it is not present. https://github.com/NLAFET/StarNEig/blob/d47ed4dfbcdaec52e44f0b02d14a6e0cde64d286/src/CMakeLists.txt#L544Expectedly, the build fails then with:
Compiler recommends including
stdlib.h
, but it does not help: