daniel-santos / mcp2210-linux

MCP2210 driver for linux
http://danielthesantos.blogspot.com/search/label/mcp2210
51 stars 33 forks source link

kernel crashing in spi_submit_prepare #35

Closed tchen61 closed 5 years ago

tchen61 commented 6 years ago

in cmp2210-core.c submit_urbs, driver calls spi_submit_prepare with a pointer to (struct mcp2210_cmd ), but in the spi_submit_prepare, it CASTs that pointer to (struct mcp2210_cmd_spi_msg ) and then subsequently tries to load up an element xfer....

and then later use xfer to access the tx_buf, and len....

kernel crashes when tx_buf is NULL, but len is not....
not sure if the casting of mcp2210_cmd to mcp2210_cmd_spi_msg is correct

tchen61 commented 6 years ago

this is the cast that is questionalble ?

static int spi_submit_prepare(struct mcp2210_cmd cmd_head) { struct mcp2210_cmd_spi_msg cmd = (void *)cmd_head;

daniel-santos commented 6 years ago

Hello. This cast is C polymorphism. If you look at the definition of struct mcp2210_cmd_spi_msg you will see that it's first member is struct mcp2210_cmd head. This is how you do polymorphism in C and you can think of them as classes. The second field of the base class, struct mcp2210_cmd defines the subtype so it is never cast incorrectly (aside from memory corruption, incorrect code, etc.).

But if there's a NULL dereference then something is obviously wrong. Post the stack trace.

tchen61 commented 6 years ago

The issue I see is that if someone pass a "mcp2210_cmd" pointer,

spi_submit_prepare() somehow start using it, and we cast that pointer to pointer of another structure mcp2210_cmd_spi_msg, then we will access data from some unknown memory area

static int spi_submit_prepare(struct mcp2210_cmd cmd_head) { struct mcp2210_cmd_spi_msg cmd = (void *)cmd_head;

and a few lines later in this function, we did this...

    xfer = cmd->xfer;

and then xfer is used to get xfer->tx_buf and xfer->len lafter perhaps we need to check the type before using cmd->xfer ???

if the calling program only pass mcp2210_cmd structure pointer, cmd->xfer will be pointing to some memory location that is not defined.

here is the stack trace

[ 49.110000] usb 1-1.3: process_commands: [ 49.110000] usb 1-1.3: process_commands: got a command (83bb8a80) [ 49.110000] usb 1-1.3: submit_urbs: [ 49.120000] Kernel bug detected[#1]: [ 49.120000] CPU: 0 PID: 1272 Comm: slic_test Not tainted 3.18.84 #2 [ 49.120000] task: 839cfa20 ti: 82c70000 task.ti: 82c70000 [ 49.120000] $ 0 : 00000000 00000062 00000042 00000001 [ 49.120000] $ 4 : a3baa081 00000000 82c71e38 00000001 [ 49.120000] $ 8 : 00000001 00000000 00000000 00000009 [ 49.120000] $12 : 00000000 c0000000 40000000 82c6f704 [ 49.120000] $16 : 83bb8a80 83ba4400 00000001 a3baa080 [ 49.120000] $20 : 00000000 82c71d48 80342434 80370000 [ 49.120000] $24 : 00000001 04000000 [ 49.120000] $28 : 82c70000 82c71d28 00000000 801f2888 [ 49.120000] Hi : 00000000 [ 49.120000] Lo : b61f0000 [ 49.120000] epc : 801f0d64 spi_submit_prepare+0x2a4/0x310 [ 49.120000] ra : 801f2888 process_commands+0x53c/0x7b0 [ 49.120000] Not tainted[ 49.120000] Status: 1000dc02 KERNEL EXL [ 49.120000] Cause : 50808024 [ 49.120000] PrId : 00019374 (MIPS 24Kc) [ 49.120000] Modules linked in: ath9k ath9k_common pppoe ppp_async iptable_nat ath9k_hw ath pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 mac80211 ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_id xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT slhc nf_reject_ipv4 nf_nat_masquerade_ipv4 nf_nat_ftp nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack_ftp nf_conntrack iptable_raw iptable_mangle iptable_filter ip_tables crc_ccitt compat snd_soc_core i2c_dev ledtrig_usbdev ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables snd_compress snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_rawmidi snd_seq_device snd_hwdep snd input_core soundcore regmap_spi regmap_i2c regmap_core lzo_decompress lzo_compress ipv6 arc4 crypto_blkcipher ehci_platform ehci_hcd gpio_button_hotplug crc16 aead [ 49.120000] Process slic_test (pid: 1272, threadinfo=82c70000, task=839cfa20, tls=77ce0440) [ 49.120000] Stack : 839cfa20 801e29e0 839cfa20 83a121e8 82c71e38 00000000 83b29b80 82c71d68 001e8480 00010000 00010001 00010001 03000000 00000000 83b63990 83ba4400 83bb8a80 00000000 803c0000 83ba4400 00000000 801f2888 8036d46c 83ba4400 82c71e78 8036d4b4 83bb8a80 65786974 00000000 c0000000 00000000 00000000 83ba4400 801f168c 83b6f400 00000000 82c71e80 83b6fa00 7f926718 00000000 ... [ 49.120000] Call Trace: [ 49.120000] [<801f0d64>] spi_submit_prepare+0x2a4/0x310 [ 49.120000] [<801f2888>] process_commands+0x53c/0x7b0 [ 49.120000] [<801ec250>] spi_async+0x60/0x8c [ 49.120000] [<801ee498>] spidev_sync.isra.7+0x50/0x90 [ 49.120000] [<801ee68c>] spidev_read+0x94/0x11c [ 49.120000] [<800fc34c>] vfs_read+0x90/0x174 [ 49.120000] [<800fca08>] SyS_read+0x58/0xc4 [ 49.120000] [<80062b9c>] handle_sys+0x11c/0x140 [ 49.120000] [ 49.120000] Code: 02722021 12400002 02722021 <000c000d> 2406003c 00d23023 24840004 0c0193a8 00002821 [ 49.390000] ---[ end trace 60bea8a7434c23aa ]---

thanks tom

On Wed, Jul 18, 2018 at 2:07 PM, Daniel Santos notifications@github.com wrote:

Hello. This cast is C polymorphism. If you look at the definition of struct mcp2210_cmd_spi_msg https://github.com/daniel-santos/mcp2210-linux/blob/master/mcp2210.h#L803 you will see that it's first member is struct mcp2210_cmd head. This is how you do polymorphism in C and you can think of them as classes. The second field of the base class, struct mcp2210_cmd https://github.com/daniel-santos/mcp2210-linux/blob/master/mcp2210.h#L758 defines the subtype so it is never cast incorrectly (aside from memory corruption, incorrect code, etc.).

But if there's a NULL dereference then something is obviously wrong. Post the stack trace.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/daniel-santos/mcp2210-linux/issues/35#issuecomment-406023310, or mute the thread https://github.com/notifications/unsubscribe-auth/AhZIy7bpik0Q0NyeTFG4o5YWNEqqzK10ks5uH3lygaJpZM4VU3BL .

-- thomas chen cell: +1.240.401.7902

dhilst commented 6 years ago

I don't really have a grasp of what is happening, but about spi communication, usually there is a spi_transfer struct that has a tx_buf pointer, this need to be initialized or you will face a crash by NULL pointer dereferization.

Here is where it is defined: https://elixir.bootlin.com/linux/latest/source/include/linux/spi/spi.h#L776

And here are some examples of it's use: https://elixir.bootlin.com/linux/latest/source/drivers/char/tpm/st33zp24/spi.c#L119 https://elixir.bootlin.com/linux/latest/source/drivers/char/tpm/tpm_tis_spi.c#L78

From the description of the problem I guess that mcp2210_cmd_spi_msg is not properly initialized.

--

okay I take a deeper look, this is a master driver, not a protocol driver. I just forgot it, the examples I give won't help here. Anyway, I still think the problem is in up tiers of the stack. I just can't remember how this is interfaced to the user space, libusb?

daniel-santos commented 6 years ago

tchen61, were you able to solve this problem? I think I should modify this driver to check for such problems, at least when debug features are enabled, then we can get a backtrace that includes the faulty code. In the backtrace you gave, it didn't try to dereference it until some later time.

daniel-santos commented 5 years ago

I've added an appropriate check for a tx_buf == NULL and actually modified it so that it should always tolerate rx_buf == NULL.