Closed d-torrance closed 3 years ago
Perhaps if you use --enable-debug
you'll get something meaningful out of the backtrace, and you'll be able to use gdb.
I think the reason the backtrace is like that is that autotools doesn't link with the right boost_stacktrace library. Try linking with libboost_stacktrace_backtrace. On macOS, that variation is not available, so we'd need to link with libboost_stacktrace_addr2line instead. See: https://github.com/Macaulay2/M2/blob/4d15032cf1f09ea995df3e861fa29d6aac21db8f/M2/cmake/configure.cmake#L209-L216 https://github.com/Macaulay2/M2/blob/4d15032cf1f09ea995df3e861fa29d6aac21db8f/M2/cmake/check-libraries.cmake#L42 I wish there was an ax_boost_stacktrace as well :/
Why does boost have two stacktrace libraries, one of which is right and the other of which is wrong?
Why does boost have two stacktrace libraries, one of which is right and the other of which is wrong?
There are 4 actually:
libboost_stacktrace_addr2line
uses the add2line program, which needs to know the offset to work with position independent programs, so doesn't work on Ubuntu;libboost_stacktrace_addr2line
uses libbacktrace
or the GCC extension for it, but is not available on macOS;libboost_stacktrace_basic
prints what you see above;libboost_stacktrace_noop
prints nothing.Each has a variation ending with and without -mt
, indicating whether it is multithreaded, and a static and dynamic version, for a total of 16 libraries.
Note that boost_stacktrace also works without linking, which essentially compiles the basic variation from source.
Back to " autotools doesn't link with the right boost_stacktrace library. "
As far as I can tell, our autotools build links with no boost_stacktrace library. Are you saying we have to start linking?
Ubuntu offers these:
ubuntu1804$ grep 'libboost.*so$' libboost-stacktrace*.list
libboost-stacktrace1.65-dev:amd64.list:/usr/lib/x86_64-linux-gnu/libboost_stacktrace_addr2line.so
libboost-stacktrace1.65-dev:amd64.list:/usr/lib/x86_64-linux-gnu/libboost_stacktrace_backtrace.so
libboost-stacktrace1.65-dev:amd64.list:/usr/lib/x86_64-linux-gnu/libboost_stacktrace_basic.so
libboost-stacktrace1.65-dev:amd64.list:/usr/lib/x86_64-linux-gnu/libboost_stacktrace_noop.so
How do we choose?
How do we arrange so the source is not included?
Like I said:
Note that boost_stacktrace also works without linking, which essentially compiles the basic variation from source.
How do we arrange so the source is not included?
You still need the sources for the headers. Just link it with the right library and add -DBOOST_STACKTRACE_LINK
to the compile options.
After linking against boost.stacktrace, this is the output:
i28 : associatedPrimes annihilator HH_2 C
-- SIGSEGV
-* stack trace, pid: 28644
0# stack_trace(std::ostream&, bool) at ./M2/Macaulay2/d/main.cpp:124
1# segv_handler at ./M2/Macaulay2/d/main.cpp:240
2# 0xF7F43B70 in linux-gate.so.1
3# 0xF603BE1C in /lib/i386-linux-gnu/libc.so.6
4# RingZZ::add(ring_elem, ring_elem) const at ./M2/Macaulay2/e/ZZ.cpp:245
5# Ring::add_to(ring_elem&, ring_elem&) const at ./M2/Macaulay2/e/ring.cpp:186
6# GBRing::gbvector_add_to(FreeModule const*, gbvector*&, gbvector*&) [clone .part.0] at ./M2/Macaulay2/e/gbring.cpp:715
7# GBRing::gbvector_reduce_lead_term(FreeModule const*, FreeModule const*, gbvector*, gbvector*&, gbvector*&, gbvector const*, gbvector const*, bool, ring_elem&) at ./M2/Macaulay2/e/gbring.cpp:963
8# GBRing::gbvector_reduce_lead_term(FreeModule const*, FreeModule const*, gbvector*, gbvector*&, gbvector*&, gbvector const*, gbvector const*) at ./M2/Macaulay2/e/gbring.cpp:1016
9# gbA::reduce_kk(gbA::spair*) at ./M2/Macaulay2/e/gb-default.cpp:1387
10# gbA::process_spair(gbA::spair*) at ./M2/Macaulay2/e/gb-default.cpp:2281
11# gbA::do_computation() at ./M2/Macaulay2/e/gb-default.cpp:2501
12# gbA::start_computation() at ./M2/Macaulay2/e/gb-default.cpp:2575
13# GBProxy::start_computation() at ./M2/Macaulay2/e/comp-gb-proxy.hpp:55
14# rawStartComputation at ./M2/Macaulay2/e/x-gb.cpp:609
15# interface_rawStartComputation at ./M2/Macaulay2/d/interface.dd:3428
16# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1297
17# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1253
18# evaluate_applyFCE.part.0.isra.0 at ./M2/Macaulay2/d/evaluate.d:738
19# method1234o at ./M2/Macaulay2/d/actors5.d:740
20# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1297
21# evaluate_applyFCE.part.0.isra.0 at ./M2/Macaulay2/d/evaluate.d:738
22# iteratedApply at ./M2/Macaulay2/d/actors3.d:2086
23# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1297
24# evaluate_applyFCS at ./M2/Macaulay2/d/evaluate.d:461
25# evaluate_applyFCC.part.0 at ./M2/Macaulay2/d/evaluate.d:558
26# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1300
27# evaluate_applyFCC.part.0 at ./M2/Macaulay2/d/evaluate.d:562
28# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1300
29# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1253
30# evaluate_applyFCC.part.0 at ./M2/Macaulay2/d/evaluate.d:562
31# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1300
32# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1424
33# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1253
34# evaluate_applyFCE.part.0.isra.0 at ./M2/Macaulay2/d/evaluate.d:738
35# method1234o at ./M2/Macaulay2/d/actors5.d:740
36# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1297
37# evaluate_applyFCE.part.0.isra.0 at ./M2/Macaulay2/d/evaluate.d:738
38# iteratedApply at ./M2/Macaulay2/d/actors3.d:2086
39# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1297
40# evaluate_applyFCS at ./M2/Macaulay2/d/evaluate.d:461
41# evaluate_applyFCC.part.0 at ./M2/Macaulay2/d/evaluate.d:558
42# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1300
43# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1253
44# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1253
45# evaluate_applyFCS at ./M2/Macaulay2/d/evaluate.d:519
46# method1234o at ./M2/Macaulay2/d/actors5.d:740
47# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1297
48# evaluate_applyFCS at ./M2/Macaulay2/d/evaluate.d:461
49# iteratedApply at ./M2/Macaulay2/d/actors3.d:2086
50# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1297
51# evaluate_applyFCS at ./M2/Macaulay2/d/evaluate.d:461
52# evaluate_applyFCC.part.0 at ./M2/Macaulay2/d/evaluate.d:558
53# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1300
54# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1253
55# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1253
56# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1424
57# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1424
58# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1253
59# evaluate_applyFCS at ./M2/Macaulay2/d/evaluate.d:461
60# method1 at ./M2/Macaulay2/d/actors5.d:668
61# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1302
62# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1253
63# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1424
64# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1424
65# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1424
66# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1253
67# evaluate_applyFCS at ./M2/Macaulay2/d/evaluate.d:461
68# method1 at ./M2/Macaulay2/d/actors5.d:668
69# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1302
70# evaluate_applyFCE.part.0.isra.0 at ./M2/Macaulay2/d/evaluate.d:738
71# method1 at ./M2/Macaulay2/d/actors5.d:668
72# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1302
73# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1424
74# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1253
75# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1424
76# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1253
77# evaluate_applyFCE.part.0.isra.0 at ./M2/Macaulay2/d/evaluate.d:738
78# method1234o at ./M2/Macaulay2/d/actors5.d:740
79# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1297
80# evaluate_applyFCE.part.0.isra.0 at ./M2/Macaulay2/d/evaluate.d:738
81# iteratedApply at ./M2/Macaulay2/d/actors3.d:2086
82# evaluate_applyFCC.part.0 at ./M2/Macaulay2/d/evaluate.d:658
83# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1300
84# evaluate_applyFCC.part.0 at ./M2/Macaulay2/d/evaluate.d:562
85# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1300
86# evaluate_evalexcept at ./M2/Macaulay2/d/evaluate.d:1428
87# readeval3(parse_TokenFile_struct*, char, parse_DictionaryClosure_struct*, char, char, char) at ./M2/Macaulay2/d/interp.dd:272
88# loadprint(M2_string_struct*, parse_DictionaryClosure_struct*, char) at ./M2/Macaulay2/d/interp.dd:345
89# commandInterpreter_2(tagged_union*) at ./M2/Macaulay2/d/interp.dd:460
90# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1297
91# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1253
92# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1253
93# evaluate_evalexcept at ./M2/Macaulay2/d/evaluate.d:1428
94# readeval3(parse_TokenFile_struct*, char, parse_DictionaryClosure_struct*, char, char, char) at ./M2/Macaulay2/d/interp.dd:272
95# readeval(parse_TokenFile_struct*, char, char) at ./M2/Macaulay2/d/interp.dd:284
96# interp_process at ./M2/Macaulay2/d/interp.dd:600
97# interpFunc(ArgCell*) at ./M2/Macaulay2/d/main.cpp:193
98# ThreadTask::run(SupervisorThread*) at ./M2/Macaulay2/system/supervisor.cpp:377
99# SupervisorThread::threadEntryPoint() at ./M2/Macaulay2/system/supervisor.cpp:436
100# SupervisorThread::threadEntryPoint(void*) at ./M2/Macaulay2/system/supervisor.hpp:100
101# GC_inner_start_routine in /usr/lib/i386-linux-gnu/libgc.so.1
102# GC_call_with_stack_base in /usr/lib/i386-linux-gnu/libgc.so.1
103# GC_start_routine in /usr/lib/i386-linux-gnu/libgc.so.1
104# start_thread in /lib/i386-linux-gnu/libpthread.so.0
105# __clone in /lib/i386-linux-gnu/libc.so.6
-- end stack trace *-
I like the stack trace! Perhaps when we modified our way of using gmp integers, I screwed up something 32 bit vs 64 bit related, as I don't think we tested it on a single 32 bit machine. I'll take a look.
This is the relevant code:
ring_elem RingZZ::add(const ring_elem f, const ring_elem g) const
{
mpz_ptr result = new_elem();
mpz_add(result, f.get_mpz(), g.get_mpz());
mpz_reallocate_limbs(result);
return ring_elem(result);
}
I don't see a problem. If memory allocation had failed, the program would have terminated with an error message.
In investigating this issue a bit further, I noticed the following:
In the call to mpz_clear
in line 13, gmp frees the memory for _z
. (Looking at the gmp source, it's basically just a wrapper around free
.) So I think writing to _z
afterwards is undefined. Maybe that's why the SIGSEGV is happening?
In the call to
mpz_clear
in line 13, gmp frees the memory for_z
. (Looking at the gmp source, it's basically just a wrapper aroundfree
.) So I think writing to_z
afterwards is undefined. Maybe that's why the SIGSEGV is happening?
I removed the call to mpz_clear
, recompiled, and still got the SIGSEGV, so that's not the problem.
I'm fixing this temporarily in the Debian package by using a canned example: https://salsa.debian.org/science-team/macaulay2/-/blob/master/debian/patches/skip-algebraic-splines-example.patch
Would this be worth submitting upstream?
No, mpz_clear(_z)
does not free the memory for _z itself, which might, after all, be static or on the stack. Rather, it frees the array of limbs and zeroes the various fields in _z.
I'm fixing this temporarily in the Debian package by using a canned example: https://salsa.debian.org/science-team/macaulay2/-/blob/master/debian/patches/skip-algebraic-splines-example.patch
Would this be worth submitting upstream?
It would be much better to fix the bug, and having something fail is a good way to remind us that there is a bug.
Fixed in #2016 -- closing
Fixed in #2016 -- closing
Is there a reason that switching to GC_MALLOC_ATOMIC
in a few places would prevent a segmentation fault?
Is there a reason that switching to
GC_MALLOC_ATOMIC
in a few places would prevent a segmentation fault?
@jkyang92 may be able to provide more details, but that appears to be the case. See the discussion in #1938.
IIRC, I was getting these segfaults on pretty much every i386 build before switching to canned examples/skipping tests. After #2016 (and going back to generating the examples and running the tests), I've had no issues.
I believe that the proximate cause for the segfault is actually the failure of GC_MALLOC
to allocate memory and returning NULL
(in particular, the segfault tends to be in the memcpy after the GC_MALLOC).
The main cause is that for a large integer with no particular structure, the data of a gmp integer will contain random looking bytes. For blocks allocated with GC_MALLOC
this causes a lot of false pointers that the GC adds to a blacklist. On a 64 bit system this is of no real consequence due to the large address space. But on a 32 bit system, where there is only 3G of user addressable address space, this has a bad habit of polluting the entire address space with unusable blocks.
It's a convenient coincidence that in these test failures, the main source of large allocations is mpz_reallocate_limbs
, and so both the cause (the use of GC_MALLOC
) and the symptom (segfaults) are in the same function.
Jay, thanks for the cogent explanation. There should not be any places in the code where a pointer obtained from libgc is used without first testing for NULL, so an appropriate out-of-memory error message can be issued by the routine outofmem
or outofmem2
. A pull request fixing that would be welcome.
While trying to build the Debian package on an i386 chroot: