Closed c3d closed 1 year ago
This seems to have disappeared after various fixes on the processing of bignums, but keeping it open because there was no fix specifically for the generation of a malformed object or a crash in the garbage collector.
Got the crash again after implementing a stress test for big fractions that looks like this:
step("Manual computation of 997/100!");
test(CLEAR, 997, ENTER);
for(uint i = 1; i <= 100; i++)
test(i * 997 % 101, DIV);
expect("997/"
"9332621544394415268169923885626670049071596826438162146859296389521"
"7599993229915608941463976156518286253697920827223758251185210916864"
"0000000000000000000000");
The crash had the following recorder dump, which seems to indicate that a garbage collection ran into trouble during the creation of a bignum of size 486 bytes, with 363 bytes available.
[31086174 91.659522] gc_details: Scanning object 0x10deca4b6 (ends at 0x10deca4c8)
[31086175 91.659523] gc_details: Recycling 0x10deca4b6 size 18 total 1089
[31086176 91.659523] gc_details: Scanning object 0x10deca4c8 (ends at 0x10deca4cb)
[31086177 91.659523] gc_details: Recycling 0x10deca4c8 size 3 total 1092
[31086178 91.659523] gc_details: Scanning object 0x10deca4cb (ends at 0x10deca4ea)
[31086179 91.659524] gc_details: Recycling 0x10deca4cb size 31 total 1123
[31086180 91.659524] gc_details: Scanning object 0x10deca4ea (ends at 0x10deca4ed)
[31086181 91.659525] gc_details: Recycling 0x10deca4ea size 3 total 1126
[31086182 91.659525] gc_details: Scanning object 0x10deca4ed (ends at 0x10deca51e)
[31086183 91.659525] gc_details: Recycling 0x10deca4ed size 49 total 1175
[31086184 91.659526] gc_details: Scanning object 0x10deca51e (ends at 0x10deca521)
[31086185 91.659526] gc_details: Found 0x10deca51e in GC-safe pointer 0x10deca521 (0x16b8ca5b0)
[31086186 91.659526] gc_details: Moving 0x10deca51e-0x10deca521 to 0x10deca087
[31086187 91.659526] gc_details: Adjustment range is 0x10deca51e-0x10deca521, no scratch
[31086188 91.659526] gc_details: Scanning object 0x10deca521 (ends at 0x10deca576)
[31086189 91.659526] gc_details: Found 0x10deca521 in GC-safe pointer 0x10deca576 (0x16b8ca590)
[31086190 91.659526] gc_details: Moving 0x10deca521-0x10deca576 to 0x10deca08a
[31086191 91.659526] gc_details: Adjustment range is 0x10deca521-0x10deca576, no scratch
[31086192 91.659527] gc_details: Adjusting GC-safe 0x16b8ca5b0 from 0x10deca521 to 0x10deca08a
[31086193 91.659527] gc_details: Adjusting GC-safe 0x16b8ca618 from 0x10deca521 to 0x10deca08a
[31086194 91.659527] gc_details: Adjusting GC-safe 0x16b8ca7b8 from 0x10deca521 to 0x10deca08a
[31086195 91.659527] gc_details: Scanning object 0x10deca576 (ends at 0x10deca579)
[31086196 91.659527] gc_details: Found 0x10deca576 in GC-safe pointer 0x10deca579 (0x16b8ca498)
[31086197 91.659527] gc_details: Moving 0x10deca576-0x10deca579 to 0x10deca0df
[31086198 91.659527] gc_details: Adjustment range is 0x10deca576-0x10deca579, no scratch
[31086199 91.659527] gc_details: Adjusting GC-safe 0x16b8ca590 from 0x10deca576 to 0x10deca0df
[31086200 91.659527] gc_details: Adjusting GC-safe 0x16b8ca5f8 from 0x10deca576 to 0x10deca0df
[31086201 91.659527] gc_details: Adjusting GC-safe 0x16b8ca638 from 0x10deca576 to 0x10deca0df
[31086202 91.659528] gc_details: Adjusting GC-safe 0x16b8ca7a8 from 0x10deca576 to 0x10deca0df
[31086203 91.659528] gc_details: Adjustment range is 0x10deca579-0x10deca668, scratch
[31086204 91.659528] gc_details: Adjusting GC-safe 0x16b8ca488 from 0x10deca5cc to 0x10deca135
[31086205 91.659528] gc_details: Adjusting GC-safe 0x16b8ca498 from 0x10deca579 to 0x10deca0e2
[31086206 91.659529] gc: Garbage collection done, purged 1175, available 1314
[31086207 91.659529] gc_details: Adjustment range is 0x10deca0e2-0x10deca668, scratch
[31086208 91.659529] gc_details: Adjusting GC-safe 0x16b8ca488 from 0x10deca135 to 0x10deca1c8
[31086209 91.659529] gc_details: Adjusting GC-safe 0x16b8ca498 from 0x10deca0e2 to 0x10deca175
[31086210 91.659530] runtime: Initializing object 0x10deca175 type 19 bignum size 3
[31086211 91.659531] gc_details: Adjustment range is 0x10deca175-0x10deca668, scratch
[31086212 91.659531] gc_details: Adjusting GC-safe 0x16b8ca488 from 0x10deca176 to 0x10deca179
[31086213 91.659531] gc_details: Adjusting GC-safe 0x16b8ca498 from 0x10deca175 to 0x10deca178
[31086214 91.660294] keys: Push key 0 (wr 5289 rd 5289)
[31086215 91.660295] sim_keys: Key 0 coords (0, 0, 0, 0)
[31086216 91.660777] runtime: Initializing object 0x10deca178 type 19 bignum size 205
[31086217 91.660777] gc_details: Adjustment range is 0x10deca178-0x10deca668, scratch
[31086218 91.660778] gc_details: Adjusting GC-safe 0x16b8ca488 from 0x10deca208 to 0x10deca2d5
[31086219 91.660778] gc_details: Adjusting GC-safe 0x16b8ca498 from 0x10deca178 to 0x10deca245
[31086220 91.660779] runtime: Initializing object 0x10deca245 type 19 bignum size 3
[31086221 91.660779] gc_details: Adjustment range is 0x10deca245-0x10deca668, scratch
[31086222 91.660779] gc_details: Adjusting GC-safe 0x16b8ca488 from 0x10deca246 to 0x10deca249
[31086223 91.660779] gc_details: Adjusting GC-safe 0x16b8ca498 from 0x10deca245 to 0x10deca248
[31086224 91.663164] runtime: Initializing object 0x10deca248 type 19 bignum size 338
[31086225 91.663164] gc_details: Adjustment range is 0x10deca248-0x10deca668, scratch
[31086226 91.663164] gc_details: Adjusting GC-safe 0x16b8ca488 from 0x10deca312 to 0x10deca464
[31086227 91.663164] gc_details: Adjusting GC-safe 0x16b8ca498 from 0x10deca248 to 0x10deca39a
[31086228 91.663165] runtime: Initializing object 0x10deca39a type 19 bignum size 3
[31086229 91.663165] gc_details: Adjustment range is 0x10deca39a-0x10deca668, scratch
[31086230 91.663166] gc_details: Adjusting GC-safe 0x16b8ca488 from 0x10deca39b to 0x10deca39e
[31086231 91.663166] gc_details: Adjusting GC-safe 0x16b8ca498 from 0x10deca39a to 0x10deca39d
[31086232 91.669417] runtime: Initializing object 0x10deca39d type 19 bignum size 486
[31086233 91.669417] gc: Garbage collection, available 363, range 0x10dec9f3f-0x10deca39d
[31086234 91.669459] recorder_signals: Received signal 0x134ecee80, dumping recorder
[31086235 91.669459] recorder_signals: Received signal Segmentation fault: 11 (11) si_addr=0x40942d4da65ebc9, ucontext 0x11621bc18, dumping recorder
[31086236 91.669461] recorder: Recorder dump
Note that I may be hitting the limits of the kind of test I can do with 2K of memory. Still, I should get an Out of memory
error, not a crash.
The stack trace looks like this:
==43800==The signal is caused by a READ memory access.
#0 0x1048062f4 in unsigned short leb128<unsigned short, unsigned char>(unsigned char*&) leb128.h:64
#1 0x104842bd8 in runtime::integrity_test(object const*, object const*, object const**, object const**) runtime.cc:185
#2 0x104842110 in runtime::gc() runtime.cc:303
#3 0x104841fe4 in runtime::available(unsigned long) runtime.cc:146
#4 0x104838d20 in bignum* runtime::make<bignum, runtime::gcp<unsigned char const>, unsigned long>(bignum::id, runtime::gcp<unsigned char const> const&, unsigned long const&) runtime.h:773
#5 0x1048395e4 in bignum::quorem(runtime::gcp<bignum const>, runtime::gcp<bignum const>, object::id, runtime::gcp<bignum const>*, runtime::gcp<bignum const>*) bignum.cc:620
#6 0x104839888 in operator%(runtime::gcp<bignum const>, runtime::gcp<bignum const>) bignum.cc:650
#7 0x10483bdb4 in gcd(runtime::gcp<bignum const>, runtime::gcp<bignum const>) fraction.cc:197
#8 0x10483b888 in big_fraction::make(runtime::gcp<bignum const>, runtime::gcp<bignum const>) fraction.cc:209
#9 0x10483cc04 in operator/(runtime::gcp<fraction const>, runtime::gcp<fraction const>) fraction.cc:282
#10 0x10484d978 in div::fraction_ok(runtime::gcp<fraction const>&, runtime::gcp<fraction const>&) arithmetic.cc:408
#11 0x10484b57c in arithmetic::evaluate(object::id, runtime::gcp<algebraic const>&, runtime::gcp<algebraic const>&, void (*)(BID_UINT128*, BID_UINT128*, BID_UINT128*), void (*)(unsigned long long*, unsigned long long*, unsigned long long*), void (*)(unsigned int*, unsigned int*, unsigned int*), bool (*)(object::id&, object::id&, unsigned long long&, unsigned long long&), bool (*)(runtime::gcp<bignum const>&, runtime::gcp<bignum const>&), bool (*)(runtime::gcp<fraction const>&, runtime::gcp<fraction const>&), bool (*)(runtime::gcp<algebraic const>&, runtime::gcp<algebraic const>&, object::id&, object::id&)) arithmetic.cc:691
#12 0x10484c078 in arithmetic::evaluate(object::id, void (*)(BID_UINT128*, BID_UINT128*, BID_UINT128*), void (*)(unsigned long long*, unsigned long long*, unsigned long long*), void (*)(unsigned int*, unsigned int*, unsigned int*), bool (*)(object::id&, object::id&, unsigned long long&, unsigned long long&), bool (*)(runtime::gcp<bignum const>&, runtime::gcp<bignum const>&), bool (*)(runtime::gcp<fraction const>&, runtime::gcp<fraction const>&), bool (*)(runtime::gcp<algebraic const>&, runtime::gcp<algebraic const>&, object::id&, object::id&)) arithmetic.cc:794
#13 0x10484d428 in object::result arithmetic::evaluate<div>() arithmetic.cc:834
#14 0x104827a60 in div::do_evaluate(div const*) arithmetic.h:151
#15 0x10482b134 in object::evaluate() const object.h:302
#16 0x1048271c4 in object::do_execute(object const*) object.cc:258
#17 0x1048141a4 in object::execute() const object.h:312
#18 0x10481b860 in user_interface::handle_functions(int) user_interface.cc:2552
#19 0x1048199ec in user_interface::key(int, bool) user_interface.cc:328
#20 0x104815210 in handle_key(int, bool) main.cc:131
#21 0x104814ff0 in program_main main.cc:313
#22 0x10480cb98 in RPLThread::run() sim-rpl.cpp:60
#23 0x107788b3c in QThreadPrivate::start(void*)+0x148 (QtCore:arm64+0x180b3c) (BuildId: f0f3e1d7cb2a38ccb5b0bc3691c722f332000000200000000100000000000d00)
#24 0x1ac483fa4 in _pthread_start+0x90 (libsystem_pthread.dylib:arm64e+0x6fa4) (BuildId: 46d35233a0513f4fbba4ba56dddc4d1a32000000200000000100000000040d00)
#25 0x1ac47ed9c in thread_start+0x4 (libsystem_pthread.dylib:arm64e+0x1d9c) (BuildId: 46d35233a0513f4fbba4ba56dddc4d1a32000000200000000100000000040d00)
The integrity test is failing and crashes trying to read what looks like a bad pointer 0x40942d4da65ebc9
.
Running in the debugger causes a similar crash, where the first level of the stack is actually corrupt:
5: 014: Manual computation of 997/100! Process 46332 stopped
* thread #6, name = 'QThread', stop reason = EXC_BAD_ACCESS (code=1, address=0xf7475546c088da7e)
frame #0: 0x000000010001a2f4 simulator`unsigned short leb128<unsigned short, unsigned char>(unsigned char*&) + 16 at /Users/ddd/Work/calc/db48x/src/leb128.h:64
61 // Return the leb128 value at pointer
62 // ----------------------------------------------------------------------------
63 {
-> 64 if (bp[0] < 0x80)
^
65 return *bp++;
66 uint16_t b1 = *bp++ & 0x7F;
67 uint16_t b2 = *bp++ << 7;
Target 0: (simulator) stopped.
(lldb) up
up
frame #1: 0x000000010001a2a4 simulator`object::type() const + 36 at /Users/ddd/Work/calc/db48x/src/object.h:227
224 // ------------------------------------------------------------------------
225 {
226 byte *ptr = (byte *) this;
-> 227 id ty = (id) leb128<uint16_t>(ptr);
^
228 if (ty > NUM_IDS)
229 {
230 object_error(ty, this);
(lldb)
frame #2: 0x0000000100056bdc simulator`runtime::integrity_test(object const*, object const*, object const**, object const**) + 260 at /Users/ddd/Work/calc/db48x/src/runtime.cc:185
182 return false;
183
184 for (object_p *s = stack; s < stackEnd; s++)
-> 185 if (!*s || (*s)->type() >= object::NUM_IDS)
^
186 return false;
187
188 return true;
(lldb) p s
p s
(object_p *) $0 = 0x00000001091d1e68
(lldb) p *s
p *s
(object_p) $1 = 0xf7475546c088da7e
(lldb) p stack
p stack
(object_p *) $2 = 0x00000001091d1e68
(lldb) p stackEnd
p stackEnd
(object_p *) $3 = 0x00000001091d1e80
In the quorem
function, rs
is 502. The rg pointer is 0x1091d1d90:
(lldb) p rs
(size_t) $5 = 502
(lldb) x rg.safe
0x1091d1d90: 8a aa 32 53 bd b3 b9 52 2b e0 ad 52 e1 53 d2 59 ..2S...R+..R.S.Y
0x1091d1da0: 04 f8 f9 f2 71 d2 53 4e 13 87 f9 b1 f8 c0 91 86 ....q.SN........
and the distance to the stack is only 216.
(lldb) p (byte *)$0-(byte *)rg.safe
(long) $9 = 216
Clearly, that does not fit. So my rg
pointer scribbled all over the stack.
It looks like the computation of needed
was wrong in that case:
(lldb) p needed
p needed
(size_t) $10 = 317
The size of x
and y
are 2
and 315
respectively:
(lldb) p xs
(size_t) $11 = 2
(lldb) p ys
(size_t) $12 = 315
So why is the remainder so big?
Given the test that we are running, i.e. 997 divided by successive integers, the expectation would be that the denominator (xg
) would be big, not the numerator. But here the denominator is 2 bytes long and the numerator is 315. That's unexpected.
The problem was in the loop computing the remainder of the long division
for (uint ri = 0; ri < rs; ri++)
{
c = remainder[ri] - x[ri] - c;
remainder[ri] = byte(c);
c >>= 8;
}
There were two problems with this loop:
When we enter the loop, it's possible that the remainder is one byte larger than x
, because of the carry. It's simpler to allocate one extra byte and consider x[ri]
as zero beyond xs
.
As written, the carry is negative, so we need to take the opposite.
Computing a factorial of 25, then dividing by its factors successively leads to
This was found testing the C++ interface for number processing, but was shown to exist on the stable branch, so not a regression.