demcanulty / Yokai_Holiday

VCV Rack modules
MIT License
2 stars 0 forks source link

Taps crashes Rack with segfault when added to a patch #3

Open dbonel opened 1 year ago

dbonel commented 1 year ago

Hello! I get a crash when the Taps module is clicked in the library to add it to a patch. I'm on an M1 Mac on 12.3, running the 2.3.0 x64 version of VCV Rack.

According to the Mac Console app, we have the following information about this crash:

Crashed Thread:        12  com.apple.audio.IOThread.client

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x00007f772b8d7efc
Exception Codes:       0x0000000000000001, 0x00007f772b8d7efc
Exception Note:        EXC_CORPSE_NOTIFY

VM Region Info: 0x7f772b8d7efc is not in any region.  Bytes after previous region: 34596155391741  Bytes before following region: 3722608900
      REGION TYPE                    START - END         [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      MALLOC_NANO (reserved)   600018000000-600020000000 [128.0M] rw-/rwx SM=NUL  ...(unallocated)
--->  GAP OF 0x1f77e9700000 BYTES
      MALLOC_TINY              7f7809700000-7f7809800000 [ 1024K] rw-/rwx SM=PRV  

Thread 12:

Thread 12 Crashed:: com.apple.audio.IOThread.client
0   <translation info unavailable>         0x104db4248 ???
1   plugin.dylib                           0x10db7b6cc Taps::check_params(float) + 44
2   libsystem_platform.dylib            0x7ff81502ae13 _sigtramp + 51
3   libRack.dylib                          0x10de059df rack::engine::Module::doProcess(rack::engine::Module::ProcessArgs const&) + 143
4   libRack.dylib                          0x10ddfe593 rack::engine::Engine::stepBlock(int) + 1699
5   libRack.dylib                          0x10dd26be5 rack::audio::Device::processBuffer(float const*, int, float*, int, int) + 325
6   libRack.dylib                          0x10dd5ed5f rack::RtAudioDevice::rtAudioCallback(void*, void*, unsigned int, double, unsigned int, void*) + 143
7   libRack.dylib                          0x10e25193f RtApiCore::callbackEvent(unsigned int, AudioBufferList const*, AudioBufferList const*) + 335
8   libRack.dylib                          0x10e250935 callbackHandler(unsigned int, AudioTimeStamp const*, AudioBufferList const*, AudioTimeStamp const*, AudioBufferList*, AudioTimeStamp const*, void*) + 21
9   CoreAudio                           0x7ff816a0ec2f HALC_ProxyIOContext::IOWorkLoop() + 7441
10  CoreAudio                           0x7ff816a0c959 invocation function for block in HALC_ProxyIOContext::HALC_ProxyIOContext(unsigned int, unsigned int) + 63
11  CoreAudio                           0x7ff816bd87a8 HALB_IOThread::Entry(void*) + 72
12  libsystem_pthread.dylib             0x7ff8150154e1 _pthread_start + 125
13  libsystem_pthread.dylib             0x7ff815010f6b thread_start + 15

Here's what objdump -Cd -M intel on the dylib says about Taps::check_params. I believe +44 corresponds to offset 26cc.

00000000000026a0 <Taps::check_params(float)>:
    26a0: 55                            push    rbp
    26a1: 48 89 e5                      mov     rbp, rsp
    26a4: 53                            push    rbx
    26a5: 50                            push    rax
    26a6: f3 0f 11 45 f4                movss   dword ptr [rbp - 12], xmm0
    26ab: 48 89 fb                      mov     rbx, rdi
    26ae: 48 8b 47 20                   mov     rax, qword ptr [rdi + 32]
    26b2: 0f 10 00                      movups  xmm0, xmmword ptr [rax]
    26b5: 0f 11 87 ec e8 4c 00          movups  xmmword ptr [rdi + 5040364], xmm0
    26bc: 0f 10 40 10                   movups  xmm0, xmmword ptr [rax + 16]
    26c0: 0f 11 87 50 e9 4c 00          movups  xmmword ptr [rdi + 5040464], xmm0
    26c7: e8 d4 02 00 00                call    0x29a0 <Taps::handle_clock_input()>
    26cc: 48 8b 43 38                   mov     rax, qword ptr [rbx + 56]
    26d0: f3 0f 10 80 f0 00 00 00       movss   xmm0, dword ptr [rax + 240] ## xmm0 = mem[0],zero,zero,zero
    26d8: f3 0f 5a c0                   cvtss2sd        xmm0, xmm0
    26dc: f2 0f 59 05 3c 46 00 00       mulsd   xmm0, qword ptr [rip + 17980] ## 0x6d20 <dyld_stub_binder+0x6d20>
    26e4: f3 0f 10 8b 54 e9 4c 00       movss   xmm1, dword ptr [rbx + 5040468] ## xmm1 = mem[0],zero,zero,zero
    26ec: f3 0f 5a c9                   cvtss2sd        xmm1, xmm1
    26f0: f2 0f 59 c8                   mulsd   xmm1, xmm0
    26f4: 0f 57 c0                      xorps   xmm0, xmm0
    26f7: f2 0f 5a c1                   cvtsd2ss        xmm0, xmm1
    26fb: f3 0f 58 83 f0 e8 4c 00       addss   xmm0, dword ptr [rbx + 5040368]
    2703: f3 0f 11 83 f0 e8 4c 00       movss   dword ptr [rbx + 5040368], xmm0
    270b: f3 0f 10 0d e5 45 00 00       movss   xmm1, dword ptr [rip + 17893] ## xmm1 = mem[0],zero,zero,zero
                                                                        ## 0x6cf8 <dyld_stub_binder+0x6cf8>
    2713: 0f 2e c1                      ucomiss xmm0, xmm1
    2716: 77 08                         ja      0x2720 <Taps::check_params(float)+0x80>
(...)
demcanulty commented 1 year ago

Hi! Thanks, I’ll look into this. I don’t have an M1, to test with, so might take a little bit.

Does the crash happen every single time?

On Apr 23, 2023, at 9:35 PM, dbonel @.***> wrote:

Hello! I get a crash when the Taps module is clicked in the library to add it to a patch. I'm on an M1 Mac on 12.3, running the 2.3.0 x64 version of VCV Rack. According to the Mac Console app, we have the following information about this crash: Crashed Thread: 12 com.apple.audio.IOThread.client

Exception Type: EXC_BAD_ACCESS (SIGSEGV) Exception Codes: KERN_INVALID_ADDRESS at 0x00007f772b8d7efc Exception Codes: 0x0000000000000001, 0x00007f772b8d7efc Exception Note: EXC_CORPSE_NOTIFY

VM Region Info: 0x7f772b8d7efc is not in any region. Bytes after previous region: 34596155391741 Bytes before following region: 3722608900 REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL MALLOC_NANO (reserved) 600018000000-600020000000 [128.0M] rw-/rwx SM=NUL ...(unallocated) ---> GAP OF 0x1f77e9700000 BYTES MALLOC_TINY 7f7809700000-7f7809800000 [ 1024K] rw-/rwx SM=PRV

Thread 12: Thread 12 Crashed:: com.apple.audio.IOThread.client 0 0x104db4248 ??? 1 plugin.dylib 0x10db7b6cc Taps::check_params(float) + 44 2 libsystem_platform.dylib 0x7ff81502ae13 _sigtramp + 51 3 libRack.dylib 0x10de059df rack::engine::Module::doProcess(rack::engine::Module::ProcessArgs const&) + 143 4 libRack.dylib 0x10ddfe593 rack::engine::Engine::stepBlock(int) + 1699 5 libRack.dylib 0x10dd26be5 rack::audio::Device::processBuffer(float const, int, float, int, int) + 325 6 libRack.dylib 0x10dd5ed5f rack::RtAudioDevice::rtAudioCallback(void, void, unsigned int, double, unsigned int, void) + 143 7 libRack.dylib 0x10e25193f RtApiCore::callbackEvent(unsigned int, AudioBufferList const, AudioBufferList const) + 335 8 libRack.dylib 0x10e250935 callbackHandler(unsigned int, AudioTimeStamp const, AudioBufferList const, AudioTimeStamp const, AudioBufferList, AudioTimeStamp const, void) + 21 9 CoreAudio 0x7ff816a0ec2f HALC_ProxyIOContext::IOWorkLoop() + 7441 10 CoreAudio 0x7ff816a0c959 invocation function for block in HALC_ProxyIOContext::HALC_ProxyIOContext(unsigned int, unsigned int) + 63 11 CoreAudio 0x7ff816bd87a8 HALB_IOThread::Entry(void) + 72 12 libsystem_pthread.dylib 0x7ff8150154e1 _pthread_start + 125 13 libsystem_pthread.dylib 0x7ff815010f6b thread_start + 15

Here's what objdump -Cd -M intel on the dylib says about Taps::check_params. I believe +44 corresponds to offset 26cc. 00000000000026a0 <Taps::check_params(float)>: 26a0: 55 push rbp 26a1: 48 89 e5 mov rbp, rsp 26a4: 53 push rbx 26a5: 50 push rax 26a6: f3 0f 11 45 f4 movss dword ptr [rbp - 12], xmm0 26ab: 48 89 fb mov rbx, rdi 26ae: 48 8b 47 20 mov rax, qword ptr [rdi + 32] 26b2: 0f 10 00 movups xmm0, xmmword ptr [rax] 26b5: 0f 11 87 ec e8 4c 00 movups xmmword ptr [rdi + 5040364], xmm0 26bc: 0f 10 40 10 movups xmm0, xmmword ptr [rax + 16] 26c0: 0f 11 87 50 e9 4c 00 movups xmmword ptr [rdi + 5040464], xmm0 26c7: e8 d4 02 00 00 call 0x29a0 <Taps::handle_clock_input()> 26cc: 48 8b 43 38 mov rax, qword ptr [rbx + 56] 26d0: f3 0f 10 80 f0 00 00 00 movss xmm0, dword ptr [rax + 240] ## xmm0 = mem[0],zero,zero,zero 26d8: f3 0f 5a c0 cvtss2sd xmm0, xmm0 26dc: f2 0f 59 05 3c 46 00 00 mulsd xmm0, qword ptr [rip + 17980] ## 0x6d20 <dyld_stub_binder+0x6d20> 26e4: f3 0f 10 8b 54 e9 4c 00 movss xmm1, dword ptr [rbx + 5040468] ## xmm1 = mem[0],zero,zero,zero 26ec: f3 0f 5a c9 cvtss2sd xmm1, xmm1 26f0: f2 0f 59 c8 mulsd xmm1, xmm0 26f4: 0f 57 c0 xorps xmm0, xmm0 26f7: f2 0f 5a c1 cvtsd2ss xmm0, xmm1 26fb: f3 0f 58 83 f0 e8 4c 00 addss xmm0, dword ptr [rbx + 5040368] 2703: f3 0f 11 83 f0 e8 4c 00 movss dword ptr [rbx + 5040368], xmm0 270b: f3 0f 10 0d e5 45 00 00 movss xmm1, dword ptr [rip + 17893] ## xmm1 = mem[0],zero,zero,zero

0x6cf8 <dyld_stub_binder+0x6cf8>

2713: 0f 2e c1 ucomiss xmm0, xmm1 2716: 77 08 ja 0x2720 <Taps::check_params(float)+0x80> (...)

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

dbonel commented 1 year ago

Yes, it crashes every time.

demcanulty commented 1 year ago

Great! That bodes well for tracking this down :)

On Apr 23, 2023, at 10:01 PM, dbonel @.***> wrote:

Yes, it crashes every time. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

demcanulty commented 1 year ago

Oh, one more question, have you tried adding it to a completely empty patch?

dbonel commented 1 year ago

Oh interesting, adding it to an empty patch does work. Let me see if I can narrow it down to a set of modules that it crashes when added to.

demcanulty commented 1 year ago

That happened to me, I was chasing down an issue with module custom default settings that just didn't seem to work on my system and it was because I was using MB as my library manager. I only realized when I tested it on a blank patch and discovered different behavior.

demcanulty commented 1 year ago

If it's a really big patch, I recommend a binary search. ie, delete half the patch and see if that fixes it -- if so, you've narrowed down the search space by half, and then repeat until the culprit is found. :)

dbonel commented 1 year ago

That was the strategy I was planning to try, but I'm now only sometimes getting crashes when adding an instance to the existing patch, which makes this a lot more complicated to try to debug. If there's a pattern to when it crashes, it's not apparent to me so far.

dbonel commented 1 year ago

I've been able to get it to crash just once in a patch that started out empty, and then had a Taps instance added and then duplicated with command-D a few times, but I can't seem to replicate that either now.

dbonel commented 1 year ago

I think I now have a process that eventually leads to a crash, which is to load a big existing patch, do command-N to start a new patch from my default template, delete all of those modules, and then add a Taps and duplicate it repeatedly until eventually it crashes.

I was able to get this to happen with Rack running inside of lldb, and that implicates something inside the Taps::process() function (which isn't narrowing it down much).

Process 24307 stopped
* thread #19, stop reason = EXC_BAD_ACCESS (code=1, address=0x20f988580)
    frame #0: 0x00000001208a3409 plugin.dylib`Taps::process(rack::engine::Module::ProcessArgs const&) + 441
plugin.dylib`Taps::process:
->  0x1208a3409 <+441>: movss  %xmm1, 0x148(%rbx,%r10,4)
    0x1208a3413 <+451>: movslq 0x4ce8d4(%rbx), %r9
    0x1208a341a <+458>: movss  0x148(%rbx,%r9,4), %xmm0  ; xmm0 = mem[0],zero,zero,zero
    0x1208a3424 <+468>: movl   0x4ce904(%rbx), %r14d
Target 0: (Rack) stopped.
(lldb) bt
* thread #19, stop reason = EXC_BAD_ACCESS (code=1, address=0x20f988580)
  * frame #0: 0x00000001208a3409 plugin.dylib`Taps::process(rack::engine::Module::ProcessArgs const&) + 441
    frame #1: 0x000000010917e9df libRack.dylib`rack::engine::Module::doProcess(rack::engine::Module::ProcessArgs const&) + 143
    frame #2: 0x0000000109177593 libRack.dylib`rack::engine::Engine::stepBlock(int) + 1699
    frame #3: 0x000000010917c4a4 libRack.dylib`rack::engine::Engine_fallbackRun(rack::engine::Engine*) + 340
    frame #4: 0x000000010917d09c libRack.dylib`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(rack::engine::Engine*), rack::engine::Engine*> >(void*) + 44
    frame #5: 0x00007ff8150154e1 libsystem_pthread.dylib`_pthread_start + 125
    frame #6: 0x00007ff815010f6b libsystem_pthread.dylib`thread_start + 15
(lldb) register read rbx
     rbx = 0x00000001153b8000
(lldb) register read r10
     r10 = 0x000000003e97410e

I think if we can get a version of the plugin.dylib file with debug symbols still included, it will be a lot easier to match this up to a line in the file.

However, here's the instruction we're crashing on in the disassembled function: https://gist.github.com/dbonel/6ffa16ef44c63349af1c9b0b0addffa2#file-gistfile1-txt-L122

dbonel commented 1 year ago

Can inptr be uninitialized here, especially on the first iteration? https://github.com/demcanulty/Yokai_Holiday/blob/main/src/Taps.cpp#L551

demcanulty commented 1 year ago

Thanks for continuing to look into this, I wish I could compile on an M1 and test. My assumption was that inptr will be initialized, since it's a global variable within the struct, but woahh, yeah, it looks like I might have been wrong, and there could be a good chance that it's not!

Are you able to compile the module on your system? Looks like all of those variables should be initialized.

demcanulty commented 1 year ago

Good catch!

demcanulty commented 1 year ago

If you can't compile, I think there's a way to do cross compilation here on github.

dbonel commented 1 year ago

I'll have to look at getting a C++ toolchain set up, which hopefully isn't too much trouble.

I will note I don't think the M1 is the unique part of this (since my VCV Rack install is all x86 code running through the Rosetta translation layer), however the macOS allocator might be arranging the mappings such that a stray memory access is more likely to land in an unmapped region.

demcanulty commented 1 year ago

Ah, that's a great point, if you give me your duplication method, or even the original patch that pointed you to the issue, I'll see if I can duplicate here. I had some inexplicable crashes early on, so I coded a bunch of protection code. That whole soft start at line 529 is exactly that, me trying to avoid those crashes. So maybe the problem is here too, I just need a way to reliably trigger it.

But also if you want any help getting your VCV toolchain up and running, it seems like you'd have a good time with it. I don't have the debugging skills you've got to capture these things, I'm a firmware guy most of the time, and trying to get these disassemblies that you've got on an x86 system is still a bit of a mysterious world to me.

demcanulty commented 1 year ago

But, that said, I do have debugging working on a mac! Which seemed like a bit of a magic trick, I'm not sure I can duplicate it, but I'd be game to try.

dbonel commented 1 year ago

Okay, I already had a C++ toolchain so the only tricky part was the cross-compile. It looks like I guessed right here, because with debug symbols I get this:

Process 25626 stopped
* thread #19, stop reason = EXC_BAD_ACCESS (code=1, address=0x50139364)
    frame #0: 0x000000010e7130ea plugin.dylib`Taps::process(this=0x0000000162688000, args=<unavailable>) at Taps.cpp:551:29 [opt]
   548          int trail_look_ahead = (inptr + fade_buffer_lookahead) % target_buffer_length;
   549
   550          trail_buffer[trailptr] = delay_buffer[trail_look_ahead];
-> 551          delay_buffer[inptr] = dry;
   552          forw_wet = delay_buffer[forw_outptr];
   553
   554

And then, looking at the value of inptr:

(lldb) frame variable this->inptr
(int) this->inptr = -1150630777
(lldb) frame variable &this->delay_buffer
(float (*)[960000]) &delay_buffer = 0x0000000162688148

So that definitely lines up, 0x0000000162688148 + (-1150630777 * sizeof(float)) gives us the 0x50139364 address that generated the segfault.

It might vary by OS platform exactly what the allocator does as far as zeroing memory automatically or not, but apparently it's not safe here to assume anything is going to be initialized for you.

demcanulty commented 1 year ago

Oooh, yeah, that's great work, nicely done!! In C I'm really careful about which variables are initialized and not, but my instincts were wrong for C++. Thanks for finding this!

demcanulty commented 1 year ago

That's a gnarly un-initialized variable, super embarrassing.

dbonel commented 1 year ago

You're welcome, I'm glad I could actually help. I'm not really familiar with C++ semantics either, having spent most of my recent time with Rust where we're not allowed to do any of these naughty things. :)

demcanulty commented 1 year ago

For good measure I initialized all variables at the top-level scope in the struct Taps declaration and submitted a new version to VCV tonight. I'll check in when it gets released to make sure it passes your test. Thanks again!

dbonel commented 1 year ago

I compiled and lightly tested the v2.0.0.8 version you pushed here, with no crashes so far.

demcanulty commented 1 year ago

Oh, that's fantastic news! Thanks! Yeah, I really had wondered why sometimes I saw it crash when I loaded the module. My own fault for not reading up properly on C++ initialization guarantees. It's a satisfying solution to that mystery. I'm really surprised the compiler didn't notify me about that, because it did notify me about other potentially uninitialized variables.