Closed carlodefalco closed 2 years ago
Could you run the test by hand and look into the terminal output? Try different -np settings.
Hi,
first of all I found that, unlike what the output of
make check
says, the output of the test is not in "./test-suite.log" but rather in "sc/test/sc_test_sort.log" the error seems to be a segfault :
[libsc] This is libsc 2.8.106-dc9e
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 1 with PID 0 on node guglielmo exited on signal 11 (Segmentation fault: 11).
--------------------------------------------------------------------------
FAIL test/sc_test_sort (exit status: 139)
If I manually run the test I get the same segfault with 1 or 2 ranks I also wonder why running
make check
from the root directory only runs tests for sc and not for p4est ... I can force make to run p4est tests by typing them manually, e.g. :
$ make test/p8est_test_nodes.log
PASS: test/p8est_test_nodes
but
make check
does not run them automatically
make check in the p4est build directory runs them for p4est and libsc. make check in the libsc build directory runs them for libsc only.
If you could compile --disable-shared and run the test through mpirun -np xx valgrind test/sc_test_sort... that would be helpful. Maybe you find an error or I'll look at it myself.
make check in the p4est build directory runs them for p4est and libsc. make check in the libsc build directory runs them for libsc only.
that is the behaviour that I expected (and what I remember happened with previous versions) but it is not what I observe now.
If I type
make check
from the p4est build directory, only sc tests are executed. I can force p4est tests to run if I use a specific test as the make target, though.
E.g., typing
make test/p8est_test_nodes
in the p4est bulid directory will build the test/p8est_test_nodes executable and typing
make test/p8est_test_nodes.log
will run the test and produce the logfile. Maybe the tests are not run because of the failing test in sc? Here is the full output I get from "make check" in the p4est build dir :
$ make check
Making check in sc
/Applications/Xcode.app/Contents/Developer/usr/bin/make test/sc_test_allgather test/sc_test_arrays test/sc_test_builtin test/sc_test_darray_work test/sc_test_dmatrix test/sc_test_dmatrix_pool test/sc_test_io_sink test/sc_test_keyvalue test/sc_test_node_comm test/sc_test_notify test/sc_test_reduce test/sc_test_search test/sc_test_sort test/sc_test_sortb test/sc_test_version test/sc_test_helpers
make[2]: `test/sc_test_allgather' is up to date.
make[2]: `test/sc_test_arrays' is up to date.
make[2]: `test/sc_test_builtin' is up to date.
make[2]: `test/sc_test_darray_work' is up to date.
make[2]: `test/sc_test_dmatrix' is up to date.
make[2]: `test/sc_test_dmatrix_pool' is up to date.
make[2]: `test/sc_test_io_sink' is up to date.
make[2]: `test/sc_test_keyvalue' is up to date.
make[2]: `test/sc_test_node_comm' is up to date.
make[2]: `test/sc_test_notify' is up to date.
make[2]: `test/sc_test_reduce' is up to date.
make[2]: `test/sc_test_search' is up to date.
make[2]: `test/sc_test_sort' is up to date.
make[2]: `test/sc_test_sortb' is up to date.
make[2]: `test/sc_test_version' is up to date.
make[2]: `test/sc_test_helpers' is up to date.
/Applications/Xcode.app/Contents/Developer/usr/bin/make check-TESTS
PASS: test/sc_test_allgather
PASS: test/sc_test_arrays
PASS: test/sc_test_builtin
PASS: test/sc_test_darray_work
PASS: test/sc_test_dmatrix
PASS: test/sc_test_dmatrix_pool
PASS: test/sc_test_io_sink
PASS: test/sc_test_keyvalue
PASS: test/sc_test_node_comm
PASS: test/sc_test_notify
PASS: test/sc_test_reduce
PASS: test/sc_test_search
FAIL: test/sc_test_sort
PASS: test/sc_test_sortb
PASS: test/sc_test_version
PASS: test/sc_test_helpers
============================================================================
Testsuite summary for libsc 2.8.106-dc9e
============================================================================
# TOTAL: 16
# PASS: 15
# SKIP: 0
# XFAIL: 0
# FAIL: 1
# XPASS: 0
# ERROR: 0
============================================================================
See ./test-suite.log
Please report to info@p4est.org
============================================================================
make[3]: *** [test-suite.log] Error 1
make[2]: *** [check-TESTS] Error 2
make[1]: *** [check-am] Error 2
make: *** [check-recursive] Error 1
If you could compile --disable-shared and run the test through mpirun -np xx valgrind test/sc_test_sort... that would be helpful. Maybe you find an error or I'll look at it myself.
valgrind does not work well on macos AFAIK, but I'll try to see if I can find out soemthing interesting with lldb
thanks, Carlo
Maybe the tests are not run because of the failing test in sc?
I confirm that this is actually what is happening : indeed, if any test fails in sc, make check
exits with an error
and does not run tests from p4est, if I disable the failing test in Makefile.am
the tests run and I get 62 PASS and no FAIL.
I'll comment again if I can find put more about the failing test in SC
Thanks for investigating, it is not unreasonable to stop testing if a subdirectory fails.
Looking forward to the actual test output by this program.
I compiled with --disable-shared --enable-static -O0 -g
and tried the same test again in the lldb debugger.
Here is the stacktrace returned by lldb
$ glibtool --mode=execute lldb ./sc_test_sort
(lldb) target create "./sc_test_sort"
Current executable set to '/Users/carlo/Documents/_Programmi/Altrui/p4est-github/BUILD/sc/test/sc_test_sort' (x86_64).
(lldb) run
Process 81351 launched: '/Users/carlo/Documents/_Programmi/Altrui/p4est-github/BUILD/sc/test/sc_test_sort' (x86_64)
2021-01-08 19:04:39.576583+0100 orted[81356:4257279] [CL_INVALID_OPERATION] : OpenCL Error : Failed to retrieve device information! Invalid enumerated value!
2021-01-08 19:04:39.576878+0100 orted[81356:4257279] [CL_INVALID_OPERATION] : OpenCL Error : Failed to retrieve device information! Invalid enumerated value!
2021-01-08 19:04:39.576890+0100 orted[81356:4257279] [CL_INVALID_OPERATION] : OpenCL Error : Failed to retrieve device information! Invalid enumerated value!
[libsc] This is libsc 2.8.106-dc9e
Process 81351 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x7ffeefbff330)
frame #0: 0x00007ffeefbff330
-> 0x7ffeefbff330: xorb %ah, %ch
0x7ffeefbff332: adcb %al, (%rcx)
0x7ffeefbff334: addl %eax, (%rax)
0x7ffeefbff336: addb %al, (%rax)
Target 0: (sc_test_sort) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x7ffeefbff330)
* frame #0: 0x00007ffeefbff330
frame #1: 0x00007fff202b7b32 libsystem_c.dylib`_qsort + 686
frame #2: 0x000000010000de91 sc_test_sort`sc_psort_bitonic(pst=0x00007ffeefbff330, lo=0, hi=11, dir=1) at sc_sort.c:423:7
frame #3: 0x000000010000dd9f sc_test_sort`sc_psort(mpicomm=0x000000010110e530, base=0x000000010371c810, nmemb=0x000000010371c180, size=8, compar=(sc_test_sort`sc_double_compare at sc.c:750)) at sc_sort.c:483:3
frame #4: 0x00000001000048dd sc_test_sort`main(argc=1, argv=0x00007ffeefbff8e8) at test_sort.c:107:3
frame #5: 0x00007fff203a2621 libdyld.dylib`start + 1
frame #6: 0x00007fff203a2621 libdyld.dylib`start + 1
It looks like the crash is occurring here I can inspect values of variables at the time of the crash if you tell what you think may be useful
Hope this helps, c.
P.S.: I have no idea what the OpenCL errors are about ...
It runs fine with me. Good catch though: Maybe your qsort_r is dead, or we're using it incorrectly. Can you comment the check for it and test with the regular qsort function?
it may be that qsort_r has a different signature on the mac, indeed man qsort_r
on my system says
QSORT(3) BSD Library Functions Manual QSORT(3)
NAME
heapsort, heapsort_b, mergesort, mergesort_b, qsort, qsort_b, qsort_r -- sort functions
SYNOPSIS
#include <stdlib.h>
int
heapsort(void *base, size_t nel, size_t width, int (*compar)(const void *, const void *));
int
heapsort_b(void *base, size_t nel, size_t width, int (^compar)(const void *, const void *));
int
mergesort(void *base, size_t nel, size_t width, int (*compar)(const void *, const void *));
int
mergesort_b(void *base, size_t nel, size_t width, int (^compar)(const void *, const void *));
void
qsort(void *base, size_t nel, size_t width, int (*compar)(const void *, const void *));
void
qsort_b(void *base, size_t nel, size_t width, int (^compar)(const void *, const void *));
void
qsort_r(void *base, size_t nel, size_t width, void *thunk, int (*compar)(void *, const void *, const void *));
the last argument seems to have a type that is different from that passed here
indeed the patch below fixes the problem for me
diff --git a/src/sc_sort.c b/src/sc_sort.c
index ed2b527c..e9b7050c 100644
--- a/src/sc_sort.c
+++ b/src/sc_sort.c
@@ -64,7 +64,7 @@ sc_icompare (const void *v1, const void *v2)
/** Pass qsort-style comparison function as third argument */
static int
-sc_compare_r (const void *v1, const void *v2, void *arg)
+sc_compare_r (void *arg, const void *v1, const void *v2)
{
sc_psort_t *pst = (sc_psort_t *) arg;
return pst->compar (v1, v2);
@@ -72,7 +72,7 @@ sc_compare_r (const void *v1, const void *v2, void *arg)
/** Pass qsort-style comparison function as third argument */
static int
-sc_icompare_r (const void *v1, const void *v2, void *arg)
+sc_icompare_r (void *arg, const void *v1, const void *v2)
{
sc_psort_t *pst = (sc_psort_t *) arg;
return pst->compar (v2, v1);
@@ -421,7 +421,7 @@ sc_psort_bitonic (sc_psort_t * pst, size_t lo, size_t hi, int dir)
n, pst->size, dir ? sc_compare : sc_icompare);
#else
qsort_r (pst->my_base + (lo - pst->my_lo) * pst->size,
- n, pst->size, dir ? sc_compare_r : sc_icompare_r, pst);
+ n, pst->size, pst, dir ? sc_compare_r : sc_icompare_r);
#endif
}
of course it cannot be applied unconditionally though, one would need to change the configure test in order to detect whether GNU or BSD version of qsort_r is available ...
Indeed, this is unfortunate. We can (a) go back to standard qsort for macs or (b) you might like to write a configure test that distinguishes the two behaviours. I'd be happy to integrate if you go for (b).
or maybe (c) use a portable version of qsort_r as provided in the GNU Portability Library ?
It seems it is provided there in order to fix exactly the same problem as occurred here
this project has a configure test that does what is needed manually.
That is an old archived version of the project, the new version just relies on gnulib for portability instead
The patch below works for me, could you please test it?
From 639ceb688cfbfbf00a46c49bd3504bd346fed048 Mon Sep 17 00:00:00 2001
From: Carlo de Falco <carlo.defalco@polimi.it>
Date: Sat, 9 Jan 2021 07:24:39 +0100
Subject: [PATCH] Check whether qsort_r is GNU or BSD style
---
configure.ac | 25 +++++++++++++++++++++++++
src/sc_sort.c | 18 +++++++++++++-----
2 files changed, 38 insertions(+), 5 deletions(-)
diff --git a/configure.ac b/configure.ac
index cd3bc34e..40ef3db1 100644
--- a/configure.ac
+++ b/configure.ac
@@ -85,6 +85,31 @@ AC_CHECK_FUNCS([strtol strtoll])
AC_CHECK_FUNCS([fsync])
AC_CHECK_FUNCS([qsort_r])
+# the order of qsort_r input parameters is reversed in BSD / macos
+if test x$ac_cv_func_qsort_r = xyes ; then
+ AC_MSG_CHECKING([whether qsort_r is BSD or GNU])
+ AC_RUN_IFELSE([AC_LANG_SOURCE([[
+ #define _GNU_SOURCE
+ #include <stdlib.h>
+ static int comparefun (const void *x, const void *x, void *z) {
+ return *(const double *)a - *(const double *)b;
+ }
+ int main (void) {
+ double v[3] = { 1.0, 2.0, 0.0 };
+ qsort_r (v, 3, sizeof (double), comparefun, NULL);
+ return !(v[0] == 0.0 && v[1] == 1.0 && arr[2] == 2.0);
+ }]])],
+ [
+ AC_MSG_RESULT([GNU])
+ ],[
+ AC_DEFINE(HAVE_QSORT_R_BSD, 1, [Define to 1 if 'qsort_r' uses BSD style arguments])
+ AC_MSG_RESULT([BSD])
+ ],[
+ AC_MSG_RESULT([cannot test on a cross-compiler, assuming GNU])
+ ])
+fi
+
+
echo "o---------------------------------------"
echo "| Checking libraries"
echo "o---------------------------------------"
diff --git a/src/sc_sort.c b/src/sc_sort.c
index ed2b527c..299b60c2 100644
--- a/src/sc_sort.c
+++ b/src/sc_sort.c
@@ -62,17 +62,25 @@ sc_icompare (const void *v1, const void *v2)
#else
-/** Pass qsort-style comparison function as third argument */
+#ifdef SC_HAVE_QSORT_R_BSD
+# define QSORT_R_PARAMS(P1, P2, P3, P4, P5) (P1, P2, P3, P5, P4)
+# define QSORT_R_CMP_PARAMS(P1, P2, P3) (P3, P1, P2)
+#else
+# define QSORT_R_PARAMS(P1, P2, P3, P4, P5) (P1, P2, P3, P4, P5)
+# define QSORT_R_CMP_PARAMS(P1, P2, P3) (P1, P2, P3)
+#endif
+
+/** Pass qsort-style comparison function as additional argument */
static int
-sc_compare_r (const void *v1, const void *v2, void *arg)
+sc_compare_r QSORT_R_CMP_PARAMS(const void *v1, const void *v2, void *arg)
{
sc_psort_t *pst = (sc_psort_t *) arg;
return pst->compar (v1, v2);
}
-/** Pass qsort-style comparison function as third argument */
+/** Pass qsort-style comparison function as additional argument */
static int
-sc_icompare_r (const void *v1, const void *v2, void *arg)
+sc_icompare_r QSORT_R_CMP_PARAMS(const void *v1, const void *v2, void *arg)
{
sc_psort_t *pst = (sc_psort_t *) arg;
return pst->compar (v2, v1);
@@ -420,7 +428,7 @@ sc_psort_bitonic (sc_psort_t * pst, size_t lo, size_t hi, int dir)
qsort (pst->my_base + (lo - pst->my_lo) * pst->size,
n, pst->size, dir ? sc_compare : sc_icompare);
#else
- qsort_r (pst->my_base + (lo - pst->my_lo) * pst->size,
+ qsort_r QSORT_R_PARAMS(pst->my_base + (lo - pst->my_lo) * pst->size,
n, pst->size, dir ? sc_compare_r : sc_icompare_r, pst);
#endif
--
2.29.2
Thanks! if you turn this into a pull request to the libsc repo, you'll be credited for contributing. I'd ask you to
config/sc_functions.m4
doc/author_yourlastname.txt
releasing your code under BSD.It would also be nice to add a function sc_qsort_r_variant to sc_sort.h that checks the variant at run time without generating invalid accesses in either situation. This way we would be able to support cross-compile environments.
It would also be nice to add a function sc_qsort_r_variant to sc_sort.h that checks the variant at run time without generating invalid accesses in either situation. This way we would be able to support cross-compile environments.
I am not aware of a way of checking the signature of a C function at runtime, how would you do this?
Alternatively one could include a portable implementation of qsort_r in the source code directly, then have an autoconf test to check whether qsort_r is available and works, otherwise build and used the portable version.
If cross-compiling, the portable qsort_r would be used.
That is what other projects do, relying on gnulib for the portable implementation.
If licensing issues prevent from using gnulib in sc/p4est because you need the code to be under BSD license, you could instead get an implementation of qsort_r from ... well, freeBSD
It would also be nice to add a function sc_qsort_r_variant to sc_sort.h that checks the variant at run time without generating invalid accesses in either situation. This way we would be able to support cross-compile environments.
I am not aware of a way of checking the signature of a C function at runtime, how would you do this?
I was just wondering if there is a form that compiles and runs for both the right and the wrong signature, such that the code can check the results.
Alternatively one could include a portable implementation of qsort_r in the source code directly, then have an autoconf test to check whether qsort_r is available and works, otherwise build and used the portable version.
If cross-compiling, the portable qsort_r would be used.
That is what other projects do, relying on gnulib for the portable implementation.
If licensing issues prevent from using gnulib in sc/p4est because you need the code to be under BSD license, you could instead get an implementation of qsort_r from ... well, freeBSD
I'd rather not go into the maintenance for it. Having a working test is fine.
If the qsort_r function may swap the callback and the data pointer, we can provide a data pointer that is such a callback. If the callback swaps the order of its arguments, we can use it to sort function pointers. This way we'd have a check that is well defined and compiles and runs for both implementations.
This is in addition to the configure check, which I still think should be first priority.
The issue is solved by a configure/cmake build check. Please check out the latest prev3-develop branches.
Does the test involving qsort_r now work ok? It should by what we have done. Please let us know.
Hi, running
for the develop branch in macos 11.1 I get
I'd like to check what went wrong, but there is no "./test-suite.log" in the build directory