adafruit / Adafruit_Blinka

Add CircuitPython hardware API and libraries to MicroPython & CPython devices
https://learn.adafruit.com/circuitpython-on-raspberrypi-linux
MIT License
446 stars 332 forks source link

Improvements to the `mcp2221.py` module #536

Open ezio-melotti opened 2 years ago

ezio-melotti commented 2 years ago

As a follow-up to #535, I'm doing more tests and tweaks to the mcp2221.py file, in particular:

  1. I've added self._hid.close() in a few places to ensure the device is closed correctly, avoiding segfaults upon exit (see also signal11/hidapi#397).
  2. I'm trying to make the native hid_mpc2221 driver work out of the box, without having to replace it with thehid_generic driver through rmmod/blacklisting:

    1. Changing 5 to 20 at line 123 seems to solve the OSError: open failed reported in #535, without having to rely on a forced sleep() at line 121: https://github.com/adafruit/Adafruit_Blinka/blob/8a25ee484e7aed8c58e4366b3fca78083bc61422/src/adafruit_blinka/microcontroller/mcp2221/mcp2221.py#L119-L125 I would replace the hardcoded 5 with MCP2221_RESET_DELAY, remove the sleep, and increase the default value of MCP2221_RESET_DELAY to 20/30s.
    2. Even though the error is solved, the open at line 125 still takes several seconds with the native hid_mpc2221 driver. I tried to close the device before the open and also to create a separate device instance, but it's always slow. This seems to be caused by the reset, but I couldn't find any documentation about that reset byte sequence, nor a way to make it fast. A minimal reproducer is:
      >>> # open the device
      >>> h = hid.device(); h.open(0x04D8, 0x00DD)
      >>> # close and reopen -- it's fast
      >>> h.close(); print('closed'); h.open(0x04D8, 0x00DD)
      closed
      >>> # send the reset bytestring
      >>> report = b"\x70\xAB\xCD\xEF"; h.write(b"\0" + report + b"\0" * (64 - len(report)))
      65
      >>> # close returns immediately, but now open fails
      >>> h.close(); print('closed'); h.open(0x04D8, 0x00DD)
      closed
      Traceback (most recent call last):
       File "<stdin>", line 1, in <module>
       File "hid.pyx", line 113, in hid.device.open
      OSError: open failed
      >>> # after ~16s open works again
      >>> h.close(); print('closed'); h.open(0x04D8, 0x00DD)
      closed
      >>>

      I feel that the reset might not be necessary if we can ensure the device is closed properly and left in a consistent state (see 1. above), and skipping it should solve the problem altogether, however I'm not familiar with all the cases where the reset might be needed, and removing it might be backward incompatible. The _reset method could also be exposed publicly, so that users can still call it if necessary.

  3. Another forced sleep might be removed here: https://github.com/adafruit/Adafruit_Blinka/blob/8a25ee484e7aed8c58e4366b3fca78083bc61422/src/adafruit_blinka/microcontroller/mcp2221/mcp2221.py#L66-L71

    device.read() accepts a timeout_ms argument that could replace the sleep, but I haven't been able to test if it works (the read always returned immediately during my tests). However it's not immediately clear if fhe sleep is there to wait before the read (in that case it should be replaced with timeout_ms or moved inside the if), or before the return (in that case a comment should be added to clarify).

I can submit one or more PRs once I complete my tests. The changes suggested in 1. are pretty straightforward; 2. could implemented regardless, since it works with both drivers, but I would still like to find a way to make the code fast with hid_mcp2221; 3. also seems pretty straightforward, but it should be tested before being changed and it depends on what the sleep is trying to fix (the explanation on #243 is a bit vague).

ezio-melotti commented 2 years ago

I created #540 to address the first point.

I haven't made much progress on the second point. It seems that with the native hid_mcp2221 driver, several seconds are required before the device can be opened once it's connected. This applies both to physically removing and plugging the device back in, and to the _reset() call (that effectively disconnects and reconnects the device again).

In addition, the devices sometimes becomes "slow", and the only way I found to fix that is causing a crash/segfault to reset it (once it gets slow the problem persists even after restarting Python). I also noticed that when it's slow, attempting to close it generates a new line in dmesg.

If I run this, h.close() returns immediately, and there is no output in dmesg:

>>> import hid, time; h = hid.device(); h.open(0x04D8, 0x00DD)
>>> h.close()  # the device is closed correctly without delay
>>> import hid, time; h = hid.device(); h.open(0x04D8, 0x00DD)
>>> h.close()  # and these don't add any output to dmesg

If I try to send a reset, this happens:

>>> import hid, time; h = hid.device(); h.open(0x04D8, 0x00DD)
>>> report = b"\x70\xAB\xCD\xEF"; h.write(b"\0" + report + b"\0" * (64 - len(report)))
65
>>> h.close()  # this still returns immediately -- probably a no-op since the device is gone
>>> # the device is not accessible for a while
>>> import hid, time; h = hid.device(); h.open(0x04D8, 0x00DD)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "hid.pyx", line 113, in hid.device.open
OSError: open failed
>>> # after a few seconds it can be opened again
>>> import hid, time; h = hid.device(); h.open(0x04D8, 0x00DD)
>>> h.close()  # but this is now slow (~16s)
>>> import hid, time; h = hid.device(); h.open(0x04D8, 0x00DD)
>>> h.close()  # and keeps being slow even after closing Python
>>> import hid, time; h = hid.device(); h.open(0x04D8, 0x00DD)
>>> h.close()  # and each of these now adds a line to dmesg
>>>

Just after doing the reset, dmesg shows:

[659138.741983] usb 5-1: USB disconnect, device number 76
[659139.233178] usb 5-1: new full-speed USB device number 77 using xhci_hcd
[659139.432800] usb 5-1: New USB device found, idVendor=04d8, idProduct=00dd, bcdDevice= 1.00
[659139.432808] usb 5-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[659139.432811] usb 5-1: Product: MCP2221 USB-I2C/UART Combo
[659139.432813] usb 5-1: Manufacturer: Microchip Technology Inc.
[659139.448967] cdc_acm 5-1:1.0: ttyACM0: USB ACM device
[659139.457208] mcp2221 0003:04D8:00DD.0076: hidraw1: USB HID v1.11 Device [Microchip Technology Inc. MCP2221 USB-I2C/UART Combo] on usb-0000:00:10.0-1/input2

Afterwards, each h.close() after the reset (except the one immediately after the reset, probably because the reset closes the device already so that h.close() is a no-op) shows:

[659169.812569] mcp2221 0003:04D8:00DD.0077: hidraw1: USB HID v1.11 Device [Microchip Technology Inc. MCP2221 USB-I2C/UART Combo] on usb-0000:00:10.0-1/input2
[659260.278641] mcp2221 0003:04D8:00DD.0078: hidraw1: USB HID v1.11 Device [Microchip Technology Inc. MCP2221 USB-I2C/UART Combo] on usb-0000:00:10.0-1/input22
[659689.375741] mcp2221 0003:04D8:00DD.0079: hidraw1: USB HID v1.11 Device [Microchip Technology Inc. MCP2221 USB-I2C/UART Combo] on usb-0000:00:10.0-1/input2

One interesting thing to notice is that these messages include 0003:04D8:00DD.007x, where 04D8 and 00DD are the vendor and product id for the mcp2221, and .007x seems to be the device number. However, the device number in this case was still 77, whereas the messages include 77, 78, 79. The number keeps increasing every time the device is (opened and) closed again, but only when it's "slow". After another reset, dmesg shows:

[659907.319058] usb 5-1: USB disconnect, device number 77
[659907.810670] usb 5-1: new full-speed USB device number 78 using xhci_hcd
[659908.010875] usb 5-1: New USB device found, idVendor=04d8, idProduct=00dd, bcdDevice= 1.00
[659908.010889] usb 5-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[659908.010894] usb 5-1: Product: MCP2221 USB-I2C/UART Combo
[659908.010898] usb 5-1: Manufacturer: Microchip Technology Inc.
[659908.027085] cdc_acm 5-1:1.0: ttyACM0: USB ACM device
[659908.035545] mcp2221 0003:04D8:00DD.007A: hidraw1: USB HID v1.11 Device [Microchip Technology Inc. MCP2221 USB-I2C/UART Combo] on usb-0000:00:10.0-1/input2
[661371.207608] mcp2221 0003:04D8:00DD.007B: hidraw1: USB HID v1.11 Device [Microchip Technology Inc. MCP2221 USB-I2C/UART Combo] on usb-0000:00:10.0-1/input2
[661402.623018] python3[180397]: segfault at 200 ip 00007f7fc731a28b sp 00007f7fc5c1dd20 error 4 in libusb-1-150b88da.0.so.0.1.0[7f7fc7310000+17000]

Here the device 77 is closed (by the reset), and once the mcp reconnects it gets assigned 78 as a new device number. However the line at [659908.035545] (which always appear after the device is connected) has .007A and after reopening the device and closing it again, the message says .007B, even though the device number is now 78. (The segfault was triggered intentionally by me by opening the device and quitting Python without closing the device).

I suspect that when a reset signal is sent to the mcp, it disconnects from the mcp side in a way that is not detected by the hid_mcp2221 driver. I looked at the source of the driver (authored by @RishiGupta12) but didn't find anything wrong with it. I also compared it with the hid_generic driver, and tried to figure out why the hid_generic driver works, but it looks pretty bare-bones.

In addition, I noticed that in the hid_mcp2221 driver there is a 4000ms timeout. The delay on the h.close() is usually ~16s, which could be explained by four consecutive 4000ms timeouts. If after a reset the device number changed, it might be trying to close the wrong device and timeoutting.

I tried to find the source of the close(), but hid_close is pretty straightforward and returns immediately when the device is not set and the hid_mcp2221 doesn't seem to have a close function, so I'm not sure what's causing the dmesg messages when the device is slow and I call h.close().

I was also able to track the source of the crash to an assertion in the libusb_linux repo (even though I'm not sure I got the right repo/fork). This failed assertion could also be explained by the aforementioned theory: if during a reset the mcp disconnects without telling the driver, the device might not be removed, and the number of devices might end up being wrong.


It seems to me that the problem with the slow device is caused by the driver and should be fixed upstream. The mcp2221.py module could avoid the reset, but since it was added to solve another issue (on Windows IIRC), maybe it could be skipped conditionally, either based on the platform, or perhaps even on the driver being used -- at least until the problem is fixed upstream. Note that the reset can already be avoided by setting os.environ['BLINKA_MCP2221_RESET_DELAY'] = '-1'.

However there are other situations where the device might become "slow" (for example if it gets unplugged and plugged back in again, possibly while it's open), so avoiding the hid_mcp2221 driver and using the hid_generic one might be a good interim solution -- at least until the hid_mcp2221 driver gets fixed (assuming the problem is really with the driver).

caternuson commented 2 years ago

I'm trying to make the native hid_mpc2221 driver work out of the box, without having to replace it with the hid_generic driver through rmmod/blacklisting

What is the motivation for this?

ezio-melotti commented 2 years ago

A few reasons, but mostly I'm trying to simplify the setup/installation. I'm currently using an SCD30 sensor connected to an MCP2221 connected to a Linux machine and I would like to be able to just plug the USB on a Linux machine, install the program I wrote to read data from the SCD30 (which will automatically install the dependencies from pip), and run it, without having to perform additional steps.

Currently this is not possible, because it's also necessary to make some system-wide changes, such as setting up the udev rules and removing/blacklisting the hid_mcp2221 driver (and I think the former can be avoided if the program is executed as root).

In addition, there might be other connected devices that might need the hid_mcp2221 driver, so removing/blacklisting it doesn't seem a clean solution, also because the hid_mcp2221 driver is supposed to be the "right one" to be used for the MCP2221.

If the issues are caused by a bug in the hid_mcp2221 driver, I would also like to have that fixed upstream, so that other people won't have to deal with them in the future, regardless of what I'll end up using (maybe @RishiGupta12 can chime in about this).

It also seems that by removing the _reset call, the hid_mcp2221 works fine and I've been running it successfully for 2 weeks. I'm trying to figure out if it's possible to workaround the issues by changing the code in the mcp2221.py module, but currently I'm not confident that this can be done reliably, since there are still some situations that might trigger the problem (e.g. unplugging and plugging back the MCP), and I haven't found a reliable way to restore the MCP other than triggering a crash that seems to reset the driver.

Finally, while discuss the issue with a friend, he brought to my attention the fact that mcp2221.py module seems to reimplement what the hid_mcp2221 driver already does, by using hidapi to communicate directly with the MCP2221. If I understood correctly, I should be able to simply use a generic_linux I2C and do something like:

from adafruit_blinka.microcontroller.generic_linux.i2c import I2C
i2c = I2C(bus_num)
scd = adafruit_scd30.SCD30(i2c)

to talk directly to the SCD30, letting the hid_mcp2221 abstract out the MCP2221 instead of using mcp2221.py. However the MCP2221 doesn't seem to be linked to any of the built-in i2c devices, since it doesn't show up when I do head /sys/class/i2c-dev/i2c-*/name.

caternuson commented 2 years ago

What would be the solution for Mac and Windows users? The current approach works on linux/mac/windows.

BhupiMAD commented 1 month ago

Hi @ezio-melotti. You seem to have dived deep down with mcp2221. I am facing an issue where by using adafruit blinka i am not able to collect data from accelerometer, connected over i2c, at higher data rate. The max it goes to is 45Hz~ Do you happen to know why this would be the case?

ezio-melotti commented 4 weeks ago

@BhupiMAD, I would recommend asking on the Discord server and/or on the Adafruit forum.