genodelabs / genode

Genode OS Framework
https://genode.org/
Other
1.05k stars 250 forks source link

usb_hid_raw.run: page faults on arm_v6/rpi #4161

Closed cproc closed 3 years ago

cproc commented 3 years ago

When running usb_hid_raw.run on arm_v6/rpi, a page fault in Genode::memcpy() occurs in usb_host_drv and usb_hid_drv:

Genode sculpt-21.03-205-g1a9469c878
251 MiB RAM and 64535 caps assigned to init
[init -> usb_hid_drv] --- USB HID input driver ---
[init -> usb_hid_drv] Warning: use compatibility mode: will claim all HID devices from USB report
[init -> usb_hid_drv] Configured HID screen with 0x0 (multitouch=false)
no RM attachment (READ pf_addr=0x0 pf_ip=0x620cc from pager_object: pd='init -> usb_hid_drv' thread='ep') 
Warning: page fault, pager_object: pd='init -> usb_hid_drv' thread='ep' ip=0x620cc fault-addr=0x0 type=no-page
[init -> usb_drv] dev_info: DWC OTG Controller
[init -> usb_drv] dev_info: new USB bus registered, assigned bus number 1
[init -> usb_drv] dev_info: irq 32, io mem 0x00000000
[init -> usb_drv] dev_info: USB hub found
[init -> usb_drv] dev_info: 1 port detected
no RM attachment (READ pf_addr=0x0 pf_ip=0x620cc from pager_object: pd='init -> usb_drv' thread='ep') 
Warning: page fault, pager_object: pd='init -> usb_drv' thread='ep' ip=0x620cc fault-addr=0x0 type=no-page

After some debugging, I got the following log messages:

Genode sculpt-21.03-206-g8b6a5ae36b
251 MiB RAM and 64535 caps assigned to init
[init -> usb_drv] 0x401fee10: ipc_reply_wait(): return address: 0x6e060
[init -> usb_drv] 0x401fee10: ipc_reply_wait(): &_snd_buf._data:    0xbbbac, _snd_buf.data():    0xbbbf8
[init -> usb_drv] 0x401fee10: ipc_reply_wait(): &reply_msg._data:   0xbbbac, reply_msg.data():   0xbbbf8
[init -> usb_drv] 0x401fee10: ipc_reply_wait(): &_rcv_buf._data:    0xbc00c, _rcv_buf.data():    0xbc058
[init -> usb_drv] 0x401fee10: ipc_reply_wait(): &request_msg._data: 0xbc00c, request_msg.data(): 0xbc058
[init -> usb_drv] 0x402fee10: ipc_reply_wait(): return address: 0x6e060
[init -> usb_drv] 0x402fee10: ipc_reply_wait(): &_snd_buf._data:    0xbca84, _snd_buf.data():    0xbcad0
[init -> usb_drv] 0x402fee10: ipc_reply_wait(): &reply_msg._data:   0xbca84, reply_msg.data():   0xbcad0
[init -> usb_drv] 0x402fee10: ipc_reply_wait(): &_rcv_buf._data:    0xbcee4, _rcv_buf.data():    0xbcf30
[init -> usb_drv] 0x402fee10: ipc_reply_wait(): &request_msg._data: 0xbcee4, request_msg.data(): 0xbcf30
[init -> usb_drv] 0x402fee10: ipc_reply_wait(): return address: 0x6e060
[init -> usb_drv] 0x402fee10: ipc_reply_wait(): &_snd_buf._data:    0xbca84, _snd_buf.data():    0xbcad0
[init -> usb_drv] 0x402fee10: ipc_reply_wait(): &reply_msg._data:   0xbca84, reply_msg.data():   0xbcad0
[init -> usb_drv] 0x402fee10: ipc_reply_wait(): &_rcv_buf._data:    0xbcee4, _rcv_buf.data():    0xbcf30
[init -> usb_drv] 0x402fee10: ipc_reply_wait(): &request_msg._data: 0xbcee4, request_msg.data(): 0xbcf30
[init -> usb_drv] dev_info: DWC OTG Controller
[init -> usb_drv] dev_info: new USB bus registered, assigned bus number 1
[init -> usb_drv] dev_info: irq 32, io mem 0x00000000
[init -> usb_drv] dev_info: USB hub found
[init -> usb_drv] dev_info: 1 port detected
[init -> usb_drv] 0x401fee10: ipc_reply_wait(): return address: 0x6e060
[init -> usb_drv] 0x401fee10: ipc_reply_wait(): &_snd_buf._data:    0xbbbac, _snd_buf.data():    0xbbbf8
[init -> usb_drv] 0x401fee10: ipc_reply_wait(): &reply_msg._data:   0x1068fd4, reply_msg.data():   0x0
[init -> usb_drv] 0x401fee10: ipc_reply_wait(): &_rcv_buf._data:    0xbc00c, _rcv_buf.data():    0xbc058
[init -> usb_drv] 0x401fee10: ipc_reply_wait(): &request_msg._data: 0xbc00c, request_msg.data(): 0xbc058
[init -> usb_drv] Error: 0x401fee10: ipc_reply_wait(): address mismatch
no RM attachment (READ pf_addr=0x0 pf_ip=0x62134 from pager_object: pd='init -> usb_drv' thread='ep')
Warning: page fault, pager_object: pd='init -> usb_drv' thread='ep' ip=0x62134 fault-addr=0x0 type=no-page

ipc_reply_wait() is called from Rpc_entrypoint::entry() and gets Rpc_entrypoint::_snd_buf and Rpc_entrypoint::_rcv_buf passed by reference and _snd_buf becomes reply_msg and _rcv_buf becomes request_msg in ipc_reply_wait().

https://github.com/genodelabs/genode/blob/777923f9bd189c2dfa8c051ac4181a10ef0ab8d3/repos/base/src/lib/base/rpc_dispatch_loop.cc#L57

https://github.com/genodelabs/genode/blob/777923f9bd189c2dfa8c051ac4181a10ef0ab8d3/repos/base-hw/src/lib/base/ipc.cc#L133-L136

As the log output shows, the address of the _data member should stay the same after entering ipc_reply_wait(), but in the last part &reply_msg._data changed for some reason and then the page fault happened because of the null pointer.

When I looked at the disassembled code, the address was apparently calculated from a general purpose register value, which got temporarily moved to a VFP register and restored to the same general purpose register again right before calling ipc_reply_wait().

To possibly get more clues I also tried git bisect on the gcc source code, which reported commit [1], which caused the generation of the VFP instructions (corresponding discussion in [2]).

So a possible cause of the wrong address could be that the VFP register somehow gets corrupted. Since the same error occurs on both hw and foc, it seems rather unlikely that the corruption happens on thread switches but who knows. I also remembered that there has been a similar problem with generated FPU instructions on x86 with the tool chain update two years ago where we had to save and restore the FPU state in the _jmp_slot function (issue #3355). Maybe this needs to be done now for ARM as well, but I.don't know for sure if it would fix the problem. If somebody could help here with the corresponding assembly code we could try it out. Apart from that I don't really have an idea at the moment what else could cause this problem.

[1] https://github.com/gcc-mirror/gcc/commit/eeae931 [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718

cproc commented 3 years ago

For reference, commit 45ce116 contains the changes I made for the debug messages.

cproc commented 3 years ago

This is the disassembled code for the debug commit reduced a bit by commit 32088e6:

Rpc_entrypoint::entry():
...
addr_snd_buf_data = &_snd_buf._data;
   6dddc:       e28430a4        add     r3, r4, #164    ; 0xa4
   6dde0:       e58d3010        str     r3, [sp, #16]
/.../repos/base/src/lib/base/rpc_dispatch_loop.cc:72

                Rpc_request const request = ipc_reply_wait(_caller, exc, _snd_buf, _rcv_buf);
   6dde4:       e2843090        add     r3, r4, #144    ; 0x90
   6dde8:       ee083a10        vmov    s16, r3
   6ddec:       e2843e4f        add     r3, r4, #1264   ; 0x4f0
   6ddf0:       e58d3014        str     r3, [sp, #20]
   6ddf4:       e28d3034        add     r3, sp, #52     ; 0x34
   6ddf8:       ee083a90        vmov    s17, r3
/.../repos/base/src/lib/base/rpc_dispatch_loop.cc:70
addr_snd_buf_data = &_snd_buf._data;
   6ddfc:       e59d3018        ldr     r3, [sp, #24]
   6de00:       e59d2010        ldr     r2, [sp, #16]
/.../repos/base/src/lib/base/rpc_dispatch_loop.cc:72
                Rpc_request const request = ipc_reply_wait(_caller, exc, _snd_buf, _rcv_buf);
   6de04:       e59d5014        ldr     r5, [sp, #20]
/.../repos/base/src/lib/base/rpc_dispatch_loop.cc:70
addr_snd_buf_data = &_snd_buf._data;
   6de08:       e5832000        str     r2, [r3]
/.../repos/base/src/lib/base/rpc_dispatch_loop.cc:72
                Rpc_request const request = ipc_reply_wait(_caller, exc, _snd_buf, _rcv_buf);
   6de0c:       e1a0100b        mov     r1, fp
   6de10:       ee183a10        vmov    r3, s16
   6de14:       ee180a90        vmov    r0, s17
   6de18:       e59d200c        ldr     r2, [sp, #12]
   6de1c:       e58d5000        str     r5, [sp]
   6de20:       ebffd1c9        bl      6254c <Genode::ipc_reply_wait(Genode::Native_capability const&, Genode::Rpc_exception_code, Genode::Msgbuf_base&, Genode::Msgbuf_base&)>
ipc_reply_wait():

0006254c <Genode::ipc_reply_wait(Genode::Native_capability const&, Genode::Rpc_exception_code, Genode::Msgbuf_base&, Genode::Msgbuf_base&)>:
Genode::ipc_reply_wait(Genode::Native_capability const&, Genode::Rpc_exception_code, Genode::Msgbuf_base&, Genode::Msgbuf_base&):
/.../repos/base-hw/src/lib/base/ipc.cc:149

Genode::Rpc_request Genode::ipc_reply_wait(Reply_capability const &,
                                           Rpc_exception_code      exc,
                                           Msgbuf_base            &reply_msg,
                                           Msgbuf_base            &request_msg)
{
   6254c:       e92d4ff0        push    {r4, r5, r6, r7, r8, r9, sl, fp, lr}
   62550:       e59f62e4        ldr     r6, [pc, #740]  ; 6283c <Genode::ipc_reply_wait(Genode::Native_capability const&, Genode::Rpc_exception_code, Genode::Msgbuf_base&, Genode::Msgbuf_base&)+0x2f0>
/.../repos/base-hw/src/lib/base/ipc.cc:151

addr_reply_msg_data = &reply_msg._data;
   62554:       e59f12e4        ldr     r1, [pc, #740]  ; 62840 <Genode::ipc_reply_wait(Genode::Native_capability const&, Genode::Rpc_exception_code, Genode::Msgbuf_base&, Genode::Msgbuf_base&)+0x2f4>
/.../repos/base-hw/src/lib/base/ipc.cc:149
{
   62558:       e08f6006        add     r6, pc, r6
/.../repos/base-hw/src/lib/base/ipc.cc:151
addr_reply_msg_data = &reply_msg._data;
   6255c:       e796c001        ldr     ip, [r6, r1]
   62560:       e2838014        add     r8, r3, #20
/.../repos/base-hw/src/lib/base/ipc.cc:154

#if 1
if (addr_reply_msg_data != addr_snd_buf_data) {
   62564:       e59f12d8        ldr     r1, [pc, #728]  ; 62844 <Genode::ipc_reply_wait(Genode::Native_capability const&, Genode::Rpc_exception_code, Genode::Msgbuf_base&, Genode::Msgbuf_base&)+0x2f8>
/.../repos/base-hw/src/lib/base/ipc.cc:151
addr_reply_msg_data = &reply_msg._data;
   62568:       e58c8000        str     r8, [ip]

If I interpreted everything correctly, these are the relevant parts:

a) the still correct &_snd_buf._data (r4+164) stored in a variable for the log message:

Rpc_entrypoint::entry():

addr_snd_buf_data = &_snd_buf._data;
   6dddc:       e28430a4        add     r3, r4, #164    ; 0xa4
   6dde0:       e58d3010        str     r3, [sp, #16]
   ...
   6de00:       e59d2010        ldr     r2, [sp, #16]
/.../repos/base/src/lib/base/rpc_dispatch_loop.cc:70
addr_snd_buf_data = &_snd_buf._data;
   6de08:       e5832000        str     r2, [r3]

b) the not always correct anymore &reply_msg._data (r4+144+20 = r4+164) received by ipc_reply_wait():

Rpc_entrypoint::entry():

                Rpc_request const request = ipc_reply_wait(_caller, exc, _snd_buf, _rcv_buf);
   6dde4:       e2843090        add     r3, r4, #144    ; 0x90
   6dde8:       ee083a10        vmov    s16, r3
   ...
   6de10:       ee183a10        vmov    r3, s16
   ...
   6de20:       ebffd1c9        bl      6254c <Genode::ipc_reply_wait(Genode::Native_capability const&, Genode::Rpc_exception_code, Genode::Msgbuf_base&, Genode::Msgbuf_base&)>
ipc_reply_wait():

   62560:       e2838014        add     r8, r3, #20
   ...
   62568:       e58c8000        str     r8, [ip]

On this whole path I don't see any other library call than ipc_reply_wait() and this function has been called successfully before by this thread, so it doesn't really look like a _jmp_slot call could be the problem here. But then what else could it be...

cproc commented 3 years ago

Commit 655159f gives an outlook what reverting the gcc commit which triggered the page faults could entail. After this gcc commit, five more commits changed parts of the same function and did not apply anymore when the original commit was missing, so they probably would need to be reverted as well.

So far it seems like only these two USB driver components on arm_v6/rpi are affected, but it would definitely be good to know if this is a problem with gcc or Genode or kernels or the hardware.

cproc commented 3 years ago

Commit 916bb4b is an attempt to save and restore the FPU state in the _jmp_slot function based on the code in vfpv2.s [1], but unfortunately it did not solve the problem.

[1] https://github.com/genodelabs/genode/blob/master/repos/base-hw/src/core/spec/arm/vfpv2.s

cproc commented 3 years ago

Commit 6daf706 fixes the problem by saving and restoring the FPU state in setjmp.S, based on the updated FreeBSD version [1].

[1] https://github.com/freebsd/freebsd-src/blob/releng/12.0/lib/libc/arm/gen/_setjmp.S