basvandijk / usb

Communicate with USB devices
Other
27 stars 12 forks source link

hotplug event segfault and other problem #17

Closed ryantm closed 7 years ago

ryantm commented 7 years ago

Hello.

Thanks for making this library, it has been helpful for learning how to use libusb and talk to USB devices. Here is a bug report:

I compiled this program with usb-1.3.0.4 and ghc 7.10.3, using ghc with no options:

import Control.Concurrent
import Control.Exception
import Control.Monad
import Data.Functor
import Data.Word
import System.USB

main = do
  ctx <- newCtx
  forever $ do
    d <- waitForMyDevice ctx (Just 8187) Nothing Nothing
    print d
    threadDelay 1000000

waitForMyDevice :: Ctx
                -> Maybe VendorId
                -> Maybe ProductId
                -> Maybe Word8
                -> IO Device
waitForMyDevice ctx mbVendorId mbProductId mbDevClass = do
  mv <- newEmptyMVar
  -- We mask asynchronous exceptions to ensure that the callback
  -- gets properly deregistered when an asynchronous exception is
  -- thrown during the interruptible takeMVar operation.
  mask_ $ do
    h <- registerHotplugCallback ctx
                                 deviceArrived
                                 enumerate
                                 mbVendorId
                                 mbProductId
                                 mbDevClass
                                 (\ dev event ->
                                    tryPutMVar mv (dev, event) $>
                                      DeregisterThisCallback)
    (dev, _event) <- takeMVar mv
                       `onException`
                         deregisterHotplugCallback h
    return dev

waitForMyDevice is copied verbatim from: https://hackage.haskell.org/package/usb-1.3.0.4/docs/System-USB-Enumeration.html#t:HotplugCallback

I am running the program on Ubuntu inside of VirtualBox

$ uname -a
Linux ryantm-1604.pololu.internal 4.4.0-59-generic #80-Ubuntu SMP Fri Jan 6 17:47:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

First, I was a bit surprised that, if I start the program with the USB device selected, it keeps printing out the device. I expected it to only print the device when the device was plugged in.

I noticed two problems:

  1. When I disconnect the device it stops printing. When I reconnect the device, it does not start printing the device again. Given that it prints out the device connected continuously, I expected it to start again when the device was plugged back in.
  2. If I have two devices with the same Vendor ID connected when I start the program, the program outputs "Segmentation fault (core dumped)". I expected it to be able to handle multiple similar devices.

Ryan

basvandijk commented 7 years ago

Hy Ryan, thanks for the bug report!

First, I was a bit surprised that, if I start the program with the USB device selected, it keeps printing out the device. I expected it to only print the device when the device was plugged in.

I believe this is because you've specified the enumerate flag which fires events for all matching currently attached devices. If you specify mempty it will only fire when you attach the device.

When I disconnect the device it stops printing. When I reconnect the device, it does not start printing the device again. Given that it prints out the device connected continuously, I expected it to start again when the device was plugged back in.

That's strange. I haven't been able to reproduce that but I will try a bit harder this evening.

If I have two devices with the same Vendor ID connected when I start the program, the program outputs "Segmentation fault (core dumped)". I expected it to be able to handle multiple similar devices.

That shouldn't happen. It could also be a bug in libusb of course. It might be interesting if you can write the same program in C and see if it has the same behaviour as the Haskell program.

dengm129 commented 7 years ago

17

Tingles commented 7 years ago

Hi Ryan, Hi Bas, I'd like to echo Ryan's remarks - Thank you for this usb library, especially the quality documentation.

To the point though, I'm also experiencing a segfault when using the library's asynchronous event notification features. Here are two programs, one in C and the other in Haskell, trying to keep the semantics as close as possible between the two.

This works. Compiled with gcc 6.3.1: $cc this-works.c -Wall -ggdb -I/usr/include/libusb-1.0 -L/usr/lib -lusb-1.0

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <libusb.h>

int hotplug_callback(struct libusb_context *ctx, struct libusb_device *dev,
             libusb_hotplug_event event, void *user_data) {

  struct libusb_device_descriptor desc;

  switch (event) {
  case LIBUSB_HOTPLUG_EVENT_DEVICE_ARRIVED :
    (void)libusb_get_device_descriptor(dev, &desc);
    printf ("Device attached: %04x:%04x\n", desc.idVendor, desc.idProduct);
    break;
  case LIBUSB_HOTPLUG_EVENT_DEVICE_LEFT :
    printf("Device detached\n");
    break;
  default : 
    printf("Unhandled event %d\n", event);
    break;
  }
  return 0;
}
int main (void) {
  libusb_hotplug_callback_handle handle;
  int rc;
  libusb_init(NULL);
  rc = libusb_hotplug_register_callback(NULL,
          LIBUSB_HOTPLUG_EVENT_DEVICE_ARRIVED | LIBUSB_HOTPLUG_EVENT_DEVICE_LEFT,
      0, 0x15a2, 0x0061, LIBUSB_HOTPLUG_MATCH_ANY, hotplug_callback, NULL,&handle);
  if (LIBUSB_SUCCESS != rc) {
    printf("Error creating a hotplug callback\n");
    libusb_exit(NULL);
    return EXIT_FAILURE;
  }
  while (1) {
    libusb_handle_events_completed(NULL, NULL);
    usleep(10000);
  }
  libusb_hotplug_deregister_callback(NULL, handle);
  libusb_exit(NULL);
  return 0;
}

2017-06-26-185414_988x299_scrot

This segfaults. Compiled with GHC 8.0.2. and Stack resolver lts-8.20. I'll attach the stack.yml and cabal file.

module Main where

import Control.Exception
import Control.Concurrent (threadDelay)
import Control.Monad (when, forever)
import Data.Functor (($>))
import System.Exit 
import System.IO
import System.USB
import Text.Printf

defaultVid = 0x15a2
defaultPid = 0x0061

main :: IO ()
main = reportDevices

reportDevices :: IO ()
reportDevices = do 

  --USB context
  ctx     <- newCtx

  when (not $ ctx `hasCapability` HasHotplug)
    (die ("platform does not support libusb-1.0 hotplugging"))

  _ <- mask_ $ do 
    -- callback for both events, report function examines event to
    -- handle both cases
    ah <- registerHotplugCallback ctx
          (deviceArrived `mappend` deviceLeft) 
          mempty
          (Just defaultVid)
          (Just defaultPid)
          Nothing
          (\dev event -> report dev event $> KeepCallbackRegistered)
    return ah
  forever $ threadDelay 100

report :: System.USB.Device -> System.USB.HotplugEvent -> IO ()
report d e
      | (matchDeviceArrived e) = printf "%s arrived\n" (show d)
      | (matchDeviceLeft e) = printf "%s departed\n" (show d)

2017-06-26-191111_987x169_scrot

test-usb-hotplug.cabal

name:                test-usb-hotplug
version:             0.1.0.0
-- synopsis:
-- description:
homepage:            https://github.com/githubuser/test-usb-hotplug#readme
license:             BSD3
license-file:        LICENSE
author:              Author name here
maintainer:          example@example.com
copyright:           2017 Author name here
category:            Web
build-type:          Simple
extra-source-files:  README.md
cabal-version:       >=1.10

executable test-usb-hotplug-exe 
  hs-source-dirs:      .
  main-is:             Main.hs
  ghc-options:         -threaded -rtsopts -with-rtsopts=-N
  build-depends:       base >= 4.7 && < 5
                     , stm
                     , usb  >= 1.3 && < 1.4
                     , vector 
  default-language:    Haskell2010

stack.yaml

flags: {}
extra-package-dbs: []
packages:
- '.'
extra-deps:
- bindings-libusb-1.4.5.0
- usb-1.3.0.4
resolver: lts-8.20

Thanks, Ian.

Ps. Hope this was the correct place to leave this information.

basvandijk commented 7 years ago

@Tingles thanks for the detailed report including the C and Haskell programs! Unfortunately I don't have much time to look at this now (I'm on a holiday).

A few things that come to mind:

EDIT: Please ignore the latter. I noticed that @ryantm already used the MVar approach and hit the segfault.

Tingles commented 7 years ago

@basvandijk @ryantm Looking back at @ryantm's original demo code I believe I was too quick to post and I missed the brunt of the point in with my C implementation. I'll have have another go at implementing the original Haskell code in C code

Tingles commented 7 years ago

@ryantm I don't have VirtualBox on my system but I spun up a Ubuntu-16.04.2 LTS guest on a KVM to test this out.

First, I was a bit surprised that, if I start the program with the USB device selected, it keeps printing out the device. I expected it to only print the device when the device was plugged in.

I was not able to recreate this with the Ubuntu KVM guest. So because of that I can't address the first of your two points.

However I was able to duplicate your second point.

If I have two devices with the same Vendor ID connected when I start the program, the program outputs "Segmentation fault (core dumped)". I expected it to be able to handle multiple similar devices.

Reproducing @ryantm results

Here's what I got on a Ubuntu-16.04.2 LTS virtual KVM guest with ghc 7.10.3 and usb 1.3.0.4.

I observed,

  1. When starting the program with a device already plugged in, the program would print the device once and subsequently segfault.
  2. When starting the program with more 2 identical devices already plugged in, the program would immediately segfault.
  3. When starting the program with no devices plugged in, the program is silent until the device is plugged in, and immediately segfault... with backtrace and a memory map dump.
*** Error in
`./.stack-work/install/x86_64-linux/lts-6.25/7.10.3/bin/test-usb-hotplug-exe':
double free or corruption (out): 0x00007f5874001880 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f58969007e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7f589690937a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f589690d53c]
/lib/x86_64-linux-gnu/libusb-1.0.so.0(+0xb365)[0x7f5897a26365]
/lib/x86_64-linux-gnu/libusb-1.0.so.0(libusb_unref_device+0x77)[0x7f5897a1f967]
/lib/x86_64-linux-gnu/libusb-1.0.so.0(libusb_free_device_list+0x29)[0x7f5897a1f9c9]
/lib/x86_64-linux-gnu/libusb-1.0.so.0(libusb_hotplug_register_callback+0x1e9)[0x7f5897a25f49]
./.stack-work/install/x86_64-linux/lts-6.25/7.10.3/bin/test-usb-hotplug-exe[0x46403a]
======= Memory map: ========
00400000-008bc000 r-xp 00000000 fc:01 13564
/home/ian/tmp/test-usb-hotplug/.stack-work/install/x86_64-linux/lts-6.25/7
.10.3/bin/test-usb-hotplug-exe
00abb000-00abc000 r--p 004bb000 fc:01 13564
/home/ian/tmp/test-usb-hotplug/.stack-work/install/x86_64-linux/lts-6.25/7
.10.3/bin/test-usb-hotplug-exe
....(removed for space)
[vsyscall]
Aborted (core dumped)

This last scenario... a "double free or corruption" error I've seen before when trying to do, what I suspsect, is more than I should have within the registered callback function - like trying to add/remove devices toa shared TMVar Set USB.Devices

One thing I notice in the original demo code you provided is that we've forgone making a Weak pointer to the mvar created in waitForMyDevice which evalutes a finalizer (destructor like thing I think..) when the mvar is garbage collected. (reference https://github.com/basvandijk/usb-example/blob/master/example.hs#L56 )

Another thing I was unsure of when reading the demo program was the exact semantics of the forever $ do ... delay block. Still learning Haskell myself, I wasn't sure if we are actually looping in the forever $ do block... e.g.

wait for device, 
print it, 
delay
re-register callback,
wait for device, 
print it
delay ... etc`

Adding a putStrLn "register callback" statement at the top of waitForMyDevice and going through the last of the three scenarios I describe above, no devices plugged in when starting the program, shows the semantic of looping the forever $ do block:

$./.stack-work/install/x86_64-linux-tinfo6/lts-6.25/7.10.3/bin/hs-test-usb-hotplug-exe    
registering callback.
Bus 001 Device 068
registering callback.
*** Error in
`./.stack-work/install/x86_64-linux-tinfo6/lts-6.25/7.10.3/bin/hs-test-usb-hotplug-exe':
double free or corruption (out): 0x00007f4698
001890 ***
(removed for space)

Not sure if this was your intended semantics when you wrote the demo program, as I say it took me a double take to understand what was going on here.

Because of the looping semantics and the lack of a finalizer, I suspect we're unknowingly creating more mVar resources than intended and running into some contention between the Haskell RTS/GC and Libusb-1.0's resource management.

Without enumerate HotplugFlag

Hope you're not sick of my text walls yet :)... I noticed in the first backtrace & memory map dump that the double free error happens in /libusb-1.0.so.0(libusb_free_device_list+0x29)

======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f58969007e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7f589690937a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f589690d53c]
/lib/x86_64-linux-gnu/libusb-1.0.so.0(+0xb365)[0x7f5897a26365]
/lib/x86_64-linux-gnu/libusb-1.0.so.0(libusb_unref_device+0x77)[0x7f5897a1f967]
/lib/x86_64-linux-gnu/libusb-1.0.so.0(libusb_free_device_list+0x29)[0x7f5897a1f9c9]  
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  
/lib/x86_64-linux-gnu/libusb-1.0.so.0(libusb_hotplug_register_callback+0x1e9)[0x7f5897a25f49]
./.stack-work/install/x86_64-linux/lts-6.25/7.10.3/bin/test-usb-hotplug-exe[0x46403a]

So I tried was using the mempty HotplugFlag instead of the enumerate HotPlugFlag having noticed that the above segfault and backtrace indicate the double free error happening in /lib/x86_64-linux-gnu/libusb-1.0.so.0(libusb_free_device_list+0x29)[0x7f5897a1f9c9] producing these results:

  1. When starting the program with a device already plugged in, the program blocks in waitForMyDevice until either:
  1. When starting the program with 2 identical devices already plugged in, the program does not segfault immediately, but blocks until one of the connected devices is detached and reattached in which case it prints the device, delays and re-registers the callback. Detach/reattach cycling after this initial success causes libusb-1.0 debug output to cease and an apparent deadlock identical to that described above.

  2. When starting the program with 2 identical devices already plugged in, the program does not segfault immediately, but blocks in a manner consistent with what you might expect. However if I plug in a 3rd identical device at this point instead of a detach/reattach cycle, the program immediately segfaults.

  3. When starting the program with no devices plugged in, the program blocks consistent as above, correctly handling the first attach hotplug event. Attaching a 2nd identical, or trying a detach/reattach cycle both result in a segfault.

To tired atm to try to and further make heads or tails of this. Most of the above hopefully makes sense... needed a brain.
This weekend maybe I'll get to writing the equivalent C program.

basvandijk commented 7 years ago

Thanks to your investigation I think I know what is happening and how to solve it:

Everytime libusb gives me a libusb_device I turn it into a value of the Haskell type Device by applying mkDev to it:

mkDev :: Ctx -> Ptr C'libusb_device -> IO Device
mkDev ctx devPtr =
    Device ctx <$> FC.newForeignPtr devPtr (c'libusb_unref_device devPtr)

As you can see I create a ForeignPtr to the libusb_device and register a finalizer to unref the libusb_device once the ForeignPtr is garbage collected.

I think this is wrong.

What I think is happening is as follows: when you attach a device with the same vendor ID and product ID libusb will give us the same pointer to the libusb_device. However, in Haskell I create to separate Devices from it; both with their own finalizer! Do you see the problem happening? When one Device is garbage collected it will unref the libusb_device. However when the second Device is garbage collected it will unref the already unreffed libusb_device and cause the double free.

basvandijk commented 7 years ago

Could you try if the fix-17 branch fixes your program?

Tingles commented 7 years ago

@basvandijk - Tested fix-17 and unfortunately I can still make @ryantm's demo program segfault. I can attach my testing results if you think they'll be helpful.

You're line of thinking is on the right track though I think.

I've patched libusb1.0 like so

diff --git a/libusb/core.c b/libusb/core.c
index 2dac909..a4dcb42 100644
--- a/libusb/core.c
+++ b/libusb/core.c
@@ -1153,9 +1153,11 @@ out:
 DEFAULT_VISIBILITY
 libusb_device * LIBUSB_CALL libusb_ref_device(libusb_device *dev)
 {
+   usbi_dbg("device %d.%d refcount: %d", dev->bus_number, dev->device_address, dev->refcnt);
    usbi_mutex_lock(&dev->lock);
    dev->refcnt++;
    usbi_mutex_unlock(&dev->lock);
+   usbi_dbg("device %d.%d refcount: %d", dev->bus_number, dev->device_address, dev->refcnt);
    return dev;
 }

@@ -1171,9 +1173,11 @@ void API_EXPORTED libusb_unref_device(libusb_device *dev)
    if (!dev)
        return;

+   usbi_dbg("device %d.%d refcount: %d", dev->bus_number, dev->device_address, dev->refcnt);
    usbi_mutex_lock(&dev->lock);
    refcnt = --dev->refcnt;
    usbi_mutex_unlock(&dev->lock);
+   usbi_dbg("device %d.%d refcount: %d", dev->bus_number, dev->device_address, dev->refcnt);

    if (refcnt == 0) {
        usbi_dbg("destroy device %d.%d", dev->bus_number, dev->device_address);

and the usb package like so

diff --git a/System/USB/Base.hs b/System/USB/Base.hs
index e3ec242..6e92e4c 100644
--- a/System/USB/Base.hs
+++ b/System/USB/Base.hs
@@ -661,7 +661,8 @@ withDevicePtr (Device ctx devFP ) f = do

 mkDev :: Ctx -> Ptr C'libusb_device -> IO Device
 mkDev ctx devPtr =
-    Device ctx <$> FC.newForeignPtr devPtr (c'libusb_unref_device devPtr)
+    Device ctx <$> FC.newForeignPtr devPtr (printf "dbg: [mkDev][finalizer]: Not calling unref_device on %s.\n" (show devPtr))
+    --(c'libusb_unref_device devPtr)

rebuilt libusb-bindings-1.4.5.0, usb-1.3.0.4, && the demo program to link with this patched libusb-1.0.

Here's the logged output of plugging in a device and subsequently removing it:

registering callback.
[timestamp] [threadID] facility level [function call] <message>
--------------------------------------------------------------------------------
[ 0.005302] [00004627] libusb: debug [libusb_handle_events_timeout_completed]
doing our own event handling
[ 0.005461] [00004627] libusb: debug [handle_events] poll fds modified,
reallocating
[ 0.005468] [00004627] libusb: debug [handle_events] poll() 2 fds with timeout
in 0ms
[ 0.005473] [00004627] libusb: debug [handle_events] poll() returned 0
[ 3.463655] [0000462b] libusb: debug [linux_get_device_address] getting address
for device: 1-4.6 detached: 0
[ 3.463671] [0000462b] libusb: debug [linux_get_device_address] scan 1-4.6
[ 3.463722] [0000462b] libusb: debug [linux_get_device_address] bus=1 dev=9
[ 3.463728] [0000462b] libusb: debug [udev_hotplug_event] udev hotplug  event. action: add.
[ 3.463732] [0000462b] libusb: debug [linux_enumerate_device] busnum 1 devaddr 9 session_id 265
[ 3.463736] [0000462b] libusb: debug [linux_enumerate_device] allocating new device for 1/9 (session 265)
[ 3.463774] [0000462b] libusb: debug [libusb_ref_device] device 1.3 refcount: 3
[ 3.463777] [0000462b] libusb: debug [libusb_ref_device] device 1.3 refcount: 4
[ 3.463779] [0000462b] libusb: debug [linux_get_parent_info] Dev 0x7f20e80017d0 (1-4.6) has parent 0x140dec0 (1-4) port 6
[ 3.463902] [00004627] libusb: debug [libusb_handle_events_timeout_completed] doing our own event handling
[ 3.463910] [00004627] libusb: debug [handle_events] poll() 2 fds with timeout in 0ms
[ 3.463913] [00004627] libusb: debug [handle_events] poll() returned 1
[ 3.463916] [00004627] libusb: debug [handle_events] caught a fish on the event pipe
[ 3.463918] [00004627] libusb: debug [handle_events] hotplug message received 
Bus 001 Device 009
dbg: [mkDev][finalizer]: Not calling unref_device on 0x00007f20e80017d0.
registering callback.
[ 6.025026] [0000462b] libusb: debug [linux_get_device_address] getting address for device: 1-4.6 detached: 1
[ 6.025043] [0000462b] libusb: debug [udev_hotplug_event] udev hotplug event. action: remove.
[ 6.025047] [0000462b] libusb: debug [libusb_ref_device] device 1.9 refcount: 1
[ 6.025049] [0000462b] libusb: debug [libusb_ref_device] device 1.9 refcount: 2
[ 6.025060] [0000462b] libusb: debug [libusb_unref_device] device 1.9 refcount: 2
[ 6.025063] [0000462b] libusb: debug [libusb_unref_device] device 1.9 refcount: 1
[ 6.025126] [00004627] libusb: debug [libusb_handle_events_timeout_completed] doing our own event handling
[ 6.025137] [00004627] libusb: debug [handle_events] poll() 2 fds with timeout in 0ms
[ 6.025141] [00004627] libusb: debug [handle_events] poll() returned 1
[ 6.025143] [00004627] libusb: debug [handle_events] caught a fish on the event pipe
[ 6.025145] [00004627] libusb: debug [handle_events] hotplug message received
[ 6.025150] [00004627] libusb: debug [libusb_unref_device] device 1.9 refcount: 1
[ 6.025152] [00004627] libusb: debug [libusb_unref_device] device 1.9 refcount: 0
[ 6.025154] [00004627] libusb: debug [libusb_unref_device] destroy device 1.9
[ 6.025156] [00004627] libusb: debug [libusb_unref_device] device 1.3 refcount: 4
[ 6.025158] [00004627] libusb: debug [libusb_unref_device] device 1.3 refcount: 3
^C[ 9.364079] [00004627] libusb: debug  [libusb_handle_events_timeout_completed] doing our own event handling
[ 9.364092] [00004627] libusb: debug [handle_events] poll() 2 fds with timeout in 0ms
[ 9.364096] [00004627] libusb: debug [handle_events] poll() returned 1
[ 9.364098] [00004627] libusb: debug [handle_events] caught a fish on the event pipe
[ 9.364100] [00004627] libusb: debug [handle_events] hotplug message received

calling out the important lines:

[ 6.025150] [00004627] libusb: debug [libusb_unref_device] device 1.9 refcount: 1
[ 6.025152] [00004627] libusb: debug [libusb_unref_device] device 1.9 refcount: 0
[ 6.025154] [00004627] libusb: debug [libusb_unref_device] destroy device 1.9

from this I believe the issue is the registered finalized call to libusb_unref_device when the ForeignPtr is garbage collected. As you can see above, the references are decremented to zero even without a finalizer. This seems strange to me and un-intuitive, to not release a resource acquired across an FFI boundary.

basvandijk commented 7 years ago

Interesting! It's really unfortunate that this is not documented in the libusb documentation on hotplugging.

So maybe we should first increment the reference count of a hotplug allocated libusb_device before we turn it into a Haskell Device with the associated finalizer.

I implemented this in the really-fix-17 branch. Could you give it a spin?

basvandijk commented 7 years ago

@Tingles @ryantm could you test if the really-fix-17 branch solves the problem? Thanks.

Tingles commented 7 years ago

@basvandijk Sorry I haven't gotten to the testing it was a Holiday weekend. I'll give it ago in now.

Tingles commented 7 years ago

@basvandijk I can no longer make the demo program segfault and the reference counts look to be appropriately decremented to 0 when removing devices from my hub. This is tested on a physical host as opposed to a KVM virtual guest. I will have to test that later on as I have time.

@ryantm - More data is always better in my mind. If you happen to get a chance please let us know what you observe.

basvandijk commented 7 years ago

@Tingles that's good news! Thanks for testing.

basvandijk commented 7 years ago

I think I will go ahead and merge this to master and start preparing a release. Thanks for the report and excellent investigation.

basvandijk commented 7 years ago

Released as usb-1.3.0.5.