Bridgetek / ft9xx-sdk

ft90x SDK
MIT License
6 stars 0 forks source link

Interrupt endpoints on low-speed devices over hubs #30

Open brtchip-gdm opened 1 year ago

brtchip-gdm commented 1 year ago

On interrupt endpoints on some low-speed devices connected over a hub to the FT90x do not receive periodic IN requests. This is due to incorrect SSPLIT IN and CSPLIT IN timing on from the host to the downstream hub.

NessDan commented 1 year ago

+1, I'm dealing with this currently on my project (a few users have keyboards with built-in hubs.)

A while ago I emailed your support and received an example project that contained a few modified drivers that seemed to fix the problem (attached.)

BareMetal USBH Example HID.ZIP

We recently put an example together for multiple HID interface handling. There was a fix that also needed to go into usbh.c too which is included in the code.

I have attached the USBH HID example that can enumerate and get reports for three HID devices. At the time we could only test with LS mice. You could use this as a reference for opening multiple HID devices concurrently. This should also help with HID devices with multiple interfaces.

I never ended up implementing it though because I had intermittent luck with it and I didn't like that it was requesting polls only every 7ms.

I've been keeping an eye on the official toolchain releases to see if the updates that were made in this example project ever made it downstream with no luck. I hope to see this updated and fixed sometime soon!

brtchip-gdm commented 1 year ago

I have made a change to the allocation of SSPLIT and CSPLIT timings for low-speed devices. See commits to https://github.com/Bridgetek/ft9xx-sdk/tree/30-interrupt-endpoints-on-low-speed-devices-over-hubs The timings may not work for all possible low-speed devices. No additional testing has been done on full-speed devices.

brtchip-gdm commented 1 year ago

I will add a task to test your attached code. On further testing there is something else amiss here and the commit in 9416dba needs some refinement. It's not a complete solution to this issue.

brtchip-gdm commented 1 year ago

I'll test more thoroughly tomorrow. I can reliably run your program @NessDan with a low-speed mouse on a hub and connected directly to the host. It survives timeouts. Code is in the same branch.

brtchip-gdm commented 1 year ago

Longer testing with a small selection full-speed and low-speed devices over hubs was successful.

Also provides a better solution to issue #25 .

NessDan commented 1 year ago

Thank you for your work on this Gordon @brtchip-gdm !

I'm having some intermittent issues with this in my project. Sometimes on bootup the keyboard and hub are detected, sometimes they aren't. It feels like maybe it's a race condition?

My project is different from the code I shared above (BareMetal USBH Example HID.ZIP) but I did notice they had a delayms(10); line that I ported over to my code and it helped a little bit, but the issue still occurred maybe 50% of the time.

brtchip-gdm commented 1 year ago

I'll merge the PR For the interrupt endpoint issue and then have a look at this issue. I'm going to take from the above that the devices are downstream from a hub. I have a couple of questions.

[Edit:] Unfortunately, I can't reproduce this with the selection of hubs I have to hand. Are you referring to the delayms(10) before the hub descriptor is read, which is immediately after the SetConfiguration for the hub in usbh.c?

        if (devNew->descriptor.bDeviceClass == USB_CLASS_HUB)
        {
            // Wait a short time for the hub to become responsive
            delayms(10);

            status = usbh_get_hub_descriptor(devNew, &hubDesc);
brtchip-gdm commented 1 year ago

Sometimes on bootup the keyboard and hub are detected, sometimes they aren't.

Does the return status from a call to USBH_get_connect_state change from USBH_STATE_NOTCONNECTED to USBH_STATE_CONNECTED to USBH_STATE_ENUMERATED? Do you have a USB Analyser by any chance?

NessDan commented 1 year ago

Apologies @brtchip-gdm I have family visiting from out of town. I will get back to you as soon as I can. In the meantime: the hub is not powered and in all of my testing, it was always a hard power-off. Here's the video of trying out the code on me playing with the hub if that helps at all: Video 1 and Video 2 (same thing as 1 just more struggling.)

The delayms(10) I was referring to is the one in the usbh_hid_test_1.c on line 159 inside the for-loop:

status = USBH_HID_set_idle(&hidCtx[i], 0x0000); //Upperbyte: value*4 = 24ms Duration
    if (status < 0)
    {
        DEBUG_PRINTF("[%x:%x:%d]HID interface set Idle error: %d\r\n", usbVid, usbPid, portnum, status);
    }
DEBUG_PRINTF("[%x:%x:%d]Reports from device %d bytes:\r\n", usbVid, usbPid, portnum, 
delayms(10);

I tried putting that in my code right after set_idle as well to see if that would help.

brtchip-gdm commented 1 year ago

Thanks for the videos. Is that a Totalphase Beagle 12? We found a very old full-speed USB 1.1 hub (with a late 1990's translucent case to match an iMac G3!) We saw that the enumeration did not detect the HID downstream of the hub on a number of occasions. The power-on sequence for the hub was good, but the analysis of the hub reset sequence showed that the hub drove the reset for slightly longer that the "preferred" 10 ms (USB Spec Section 11.5.1.5). This meant that the GetDescriptor for the device happened too soon and the device was not detected. Waiting for 20 ms for the hub port to leave the reset state seems to fix the issue for our hub. There's a patch in the commit referenced above for this.

NessDan commented 1 year ago

It is a Totalphase Beagle 12! I can use it to get any sort of report that might help you guys!

I'm so happy you found an old hub haha! I was amazed at how hard it was to come-by them!

It is connecting more often, but I'd still say it's a 50/50 on whether the hub connects. Is there anything I can capture for you to see what exactly the issue could be? Also are you using a specific project for connecting to the hub? Maybe there's something wrong with my connection code and by running what you have, the issue goes away?

brtchip-gdm commented 11 months ago

I'm using "USBH Example HID" and "USBH Example Hub" to test. Yes, the hub was in the "back of the drawer" and needed a new power supply! If you can capture both the connection working and not working, that would be good. Place the Beagle upstream of the hub and you should capture all the reset sequence. Thanks.

brtchip-gdm commented 10 months ago

Commit 14329bbd18ca874c676a7656d0a7cedf6cd86901 will fix a timeout issue when dealing with interrupt endpoints.

brtchip-gdm commented 9 months ago

@NessDan Has the commit above helped with this issue? Have you any feedback? There is also issue #47 that is dealing with device removal on the USB host.

NessDan commented 9 months ago

I'm so sorry @brtchip-gdm ! I forgot about this thread as I looked into another issue.

I'll try to add these changes later tonight after my day job and let you know if things are working out!

NessDan commented 9 months ago

I think I need to take a deeper dive into my old HUB branch because both the new files (PR 47 and commit 14329bb) aren't connecting to my old 1.1 HUB. I've gotten pulled in multiple directions lately, so I'll try to take another look at this tonight and will update you if the results change!

I hate to ask this, but if you have a project you're using to test the usbh.c changes, could you upload them here so I can easily do a 1:1 test? Trying to update my project code to work better with HUB-specific code might be causing more issues than helping.

brtchip-gdm commented 8 months ago

I have the USBH Example HID to UART example code with the usbh.c file from https://github.com/Bridgetek/ft9xx-sdk/commit/97b2f6fd81ec2c64e2e7a7726f8a2aa68491d8c2 within the project so that it is linked (and hence used) before the libft900.a library is included. Both files are from the #47 code branch. Shall I PR the #47 into the #30 branch for you?

The hub I have is identifying as a USB 1.1 hub with a Ti controller. Apologies I'm using an Ellisys today.

image

When connecting on the example code it reports the following (with a HID connected to one of the ports):

USB Device Detected
USB Devices Enumerated
HID device found at level 2
VID: 05e0 PID: 1200
Speed: 1 full
Address: 2
Number of report descriptors 1
HID descriptor: 09 21 10 01 21 01 22 4d 00
Report descriptor 1 type 0x22 size 77
Report descriptor 1: 05 01 09 06 a1 01 05 07 19 e0 29 e7 15 00 25 01 75 01 95 08 81 02 95 01 75 08 81 01 95 05 75 01 05 08 19 01 29 05 91 02 95 01 75 03 91 01 95 06 75 08 15 00 26 ff 00 05 07 19 00 2a ff 00 81 00 05 08 19 01 29 20 75 08 95 04 b1 02 c0
brtchip-gdm commented 8 months ago

I've also tested with an original 2003 "clear" Apple keyboard (A1048) with a 2 port hub:

image

This is working OK as well. If you can, send me some Beagle traces and I'll see what's happening here.

NessDan commented 8 months ago

Apologies, after so long I've finally captured the Beagle data. Note that I did some captures with the hub connected to Windows (to showcase a successful connection) and then captured to my FT908 device, the Edgeguard, running USBH_Example_HID.

1.1-hub-beagle-captures.zip

Recorded a video that corresponds to the edgeguard-usbh-example-hid-1.1-hub-stock-usbh.c.tdc capture.

Again used the usbh.c file from PR 47 and commit 14329bb separately. If you need me to test them both merged, then I would definitely take your offer to merge PR #47 into #30!

Sorry again that this has taken so long to get around to!

brtchip-gdm commented 8 months ago

Thanks Daniel,

I'm looking at the CDatreus traces and concentrating on the edgeguard-usbh-example-hid-1.1-hub-pr-47-usbh.c.tdc compared with the Windows trace windows-usb-1.1-hub-2devices-atreus-keyboard-and-gamepad.tdc. Unfortunately the DataCenter files do not include the bus speed (the "Sp" column is blank) but I assume the speed is "Full" for these as that is the fastest speed supported by the hub.

It appears that the FT90x is successfully sending the IN requests to the hub and it is correctly getting a NAK answer back from the device back through the hub.

The only theory I have is that the downstream HID device likes things done in a particular way.

In the many years of doing this we have observed devices which require that endpoints are queried in a certain order. Examples we have seen are a brand name CDC device that required the interrupt IN endpoint polled before releasing data waiting on the BULK IN endpoint; microcontroller keyboard emulator firmware with 2 interrupt IN endpoints that needed them polled alternately. It's worth checking this is not the case. I see that the Windows implementation polls all the interrupt IN endpoints on device address 36.

Can you add code to poll the other interrupt endpoints as well as the boot interface keyboard? You can throw the data away or just have a callback function to read and discard the data. This is a possible way of doing it. It is setup once before you start reading from the keyboard endpoint. It might need done on the other interrupt endpoints as well.

    int8_t cdc_stuff(uint32_t id, int8_t status, size_t len, uint8_t *buffer)
    {
      if ((status == USBH_OK) || (status == USBH_ERR_TIMEOUT))
      {
        USBH_transfer_async((USBH_endpoint_handle)id, buffer, len, 0, id, cdc_stuff)  
      }
    }

    {
      <find CDC interrupt IN endpoint>
      USBH_HID_set_idle(&hidCtx, 0);
      <after the set idle for the keyboard interface>
      /* Read from the CDC interrupt IN endpoint with a callback. The callback ignores the
         data and requeues the read. Infinite timeout. */
      USBH_transfer_async(CDCEpIn, cdc_buffer, ccd_buffer_size, 0, (uint32_t)CDCEpIn, cdc_stuff);
      while (1)
      {
        count = USBH_transfer(hidCtx.hHIDEpIn, buffer, USBH_HID_get_report_size_in(&hidCtx), 10);
        <continue as normal>
    }

Can you repeat the captures for both Windows and FT90x downstream of the hub to single out the hub sending IN requests to the CDatreus device?

I didn't see any keypresses on the CDatreus device being sent upstream to Windows in they capture you had. Can you send one with these captured please?

NessDan commented 8 months ago

So some good news first: Two devices plugged into the USB 1.1 Hub are both being detected and data is receiving!

The default HID example actually gets stuck in the first hid_testing call because once it finds one interface, it while loops in that function forever, never iterating over the other interfaces and setting idle to any of them.

I commented out that while loop so the code flow would continue, but still setting up the async listener after setting idle.

After it finishes recursively looping through devices and interfaces, all devices are responding! Attaching a video, a diff I made to get it working, and a Beagle capture of the Atreus keyboard pressing keys on Windows (raw and through the USB 1.1 Hub), as well as a capture of both keyboards pressing keys while connected to the Edgeguard with the modified USBH_Example_HID project.

But that said, some bad news: my USB 2.0 hub-integrated Apple keyboard doesn't seem to be working, and my USB 3.0 Sabrent hub + Atreus keyboard doesn't work either. (The Apple keyboard issue was my original main issue, since a user can't "disconnect" it from that hub.) Tested with the stock usbh.c and the updated, merged one you made for branch 30. UPDATE: This was an error on my part! Your usbh.c worked wonders!

I'll also be attaching related captures for the Apple keyboard into my Windows PC and my Edgeguard with the updated code, the Atreus through the USB 3.0 Hub on Windows and Edgeguard as well. One note: Because my Beagle 12 can only capture Low / Full-speed devices, the setup is actually all of the aforementioned devices plugged into the USB 1.1 Hub and that plugged into the Beagle 12 (this downgrades the speed so everything gets captured.) It's also how I'm doing all my testing.

Windows Captures:

atreus-windows.zip (Working, Pressing A) atreus-usb1.1-hub-windows-pushing-a-multiple-times.zip (Working) atreus-usb3.0-hub-into-usb1.1-hub-windows-pushing-a-multiple-times.zip (Working) apple-usb1.1-hub-windows-pushing-a.zip (Working BUT I hear the Windows "disconnect/reconnect" sound a few times when plugged in via the 1.1 Hub)

Edgeguard Captures (USBH_Example_HID Modified):

atreus-usb1.1-hub-edgeguard-transfer_async-on-all-pushing-a-multiple-times.zip (Working, Messy Desk) atreus-usb3.0-hub-into-usb1.1-hub-edgeguard-transfer_async-on-all-pushing-a-multiple-times.zip (Not Working, Confirmed It Does Some Iterating & Set Idle) (See below post for update.) apple-usb1.1-hub-edgeguard-transfer_async-all-pushing-a.zip (Not Working, this capture actually shows the device disconnecting and reconnecting multiple times.) (See below post for update.)

Diff to get USB 1.1 Hub + Devices to all Listen

diff --git a/usbh_hid_test_1.c b/usbh_hid_test_1.c
index e305e85..d9747c7 100644
--- a/usbh_hid_test_1.c
+++ b/usbh_hid_test_1.c
@@ -89,6 +89,16 @@ void ISR_timer(void)
    }
 }

+int8_t message_received_cb(uint32_t id, int8_t status, size_t len, uint8_t *buffer)
+{
+   if ((status == USBH_OK) || (status == USBH_ERR_TIMEOUT))
+   {
+       USBH_transfer_async((USBH_endpoint_handle)id, buffer, len, 0, id, message_received_cb);
+   }
+
+   return 0;
+}
+
 void hid_testing(USBH_device_handle hHIDdev, USBH_interface_handle hHID)
 {
    USBH_HID_context hidCtx;
@@ -112,51 +122,7 @@ void hid_testing(USBH_device_handle hHIDdev, USBH_interface_handle hHID)
    }
    tfp_printf("Setting idle\r\n");
    USBH_HID_set_idle(&hidCtx, 0);
-   tfp_printf("Reports from device %d bytes:\r\n", USBH_HID_get_report_size_in(&hidCtx));
-
-   while (1)
-   {
-       status = USBH_HID_get_report(&hidCtx, buffer);
-
-       if (status == USBH_OK)
-       {
-           for (i = 0; i < USBH_HID_get_report_size_in(&hidCtx); i++)
-               tfp_printf("%02x ", buffer[i]);
-           tfp_printf("\r\n");
-       }
-       else
-       {
-           switch (status)
-           {
-           case USBH_ERR_TIMEOUT:
-               tfp_printf("Timeout\r\n");
-               break;
-           case USBH_ERR_HALTED:
-               tfp_printf("Halted\r\n");
-               break;
-           case USBH_ERR_NOT_FOUND:
-               tfp_printf("Not found\r\n");
-               break;
-           case USBH_ERR_REMOVED:
-               tfp_printf("Removed\r\n");
-               break;
-           case USBH_ERR_DATA_BUF:
-               tfp_printf("Data buf error\r\n");
-               break;
-           case USBH_ERR_RESOURCES:
-               tfp_printf("Resources\r\n");
-               break;
-           case USBH_ERR_USBERR:
-               tfp_printf("USB error\r\n");
-               break;
-           default:
-               tfp_printf("Unknown error\r\n");
-               break;
-           }
-           if (status != USBH_ERR_TIMEOUT)
-               break; // exit while loop if any error
-       }
-   }
+   USBH_transfer_async(hidCtx.hHIDEpIn, (uint8_t *)buffer, hidCtx.reportInSize, 0, hidCtx.hHIDEpIn, message_received_cb);
 }

 int8_t hub_scan_for_hid(USBH_device_handle hDev, int level)
NessDan commented 8 months ago

UPDATE!! I wasn't actually running your latest usbh.c changes and the Apple keyboard WORKS!!!!

I will list some caveats though!

  1. The Atreus keyboard -> USB 3.0 Hub -> Edgeguard returns a USBH_ERR_BABBLE -18 (but the actual device detection works, I saw it SET_IDLE and find all HID interfaces.) (Video)
  2. Atreus -> USB 3.0 Hub -> USB 1.1 Hub (for Beagle capture) freezes at USBH_HID_set_idle(&hidCtx, 0); (Video.)
  3. Apple keyboard -> USB 1.1 Hub -> Edgeguard intermittently succeeds and fails (Video) (This failed 3 separate ways, I hope that can help narrow down the issue!)
    1. Captures: apple-usb1.1-hub-edgeguard-intermittent-issue-fail-and-success.zip
  4. (Apple / Dell 1.1 / Razer Tartarus Keypad) -> USB 3.0 Hub -> Edgeguard intermittently succeeds and fails (Succeeds very often! Might be some sort of timing issue? Got it on video, no Beagle capture. Let me know if you would like one.) (Video pt. 1) (Video pt. 2)

Feels great to see things moving along since testing with your PR! Please let me know what you need tested or sent over next!

brtchip-gdm commented 8 months ago

I don't actually have a USB 3.0 hub! They should be backward compatible though (!).

  1. USBH_ERR_BABBLE -18 ends a transaction but in general the program can continue.

Does it matter which port the device is plugged into at all?

I'm wondering if there is an issue with the uFrame-C-Masks and S-Masks. I'll try and replicate a similar fail and look into it.

brtchip-gdm commented 8 months ago

I've tried to replicate "Apple keyboard -> USB 1.1 Hub -> " but it works reliably as far as I can tell. Does it fail often?

NessDan commented 8 months ago

I've tried to replicate "Apple keyboard -> USB 1.1 Hub -> " but it works reliably as far as I can tell. Does it fail often?

Yes, for me it did, 25% success rate on the 4 attempts I made 😔

NessDan commented 8 months ago

I'm wondering if there is an issue with the uFrame-C-Masks and S-Masks. I'll try and replicate a similar fail and look into it.

If you have any theories you want me to try since I'm able to run into this consistently, let me know!

brtchip-gdm commented 8 months ago

I've tried all the ports on my USB 1.1 hub, they work the same. I've, so far, not seen and of these failure modes from the setup of "Apple keyboard -> USB 1.1 Hub -> ". My theory is that your full-speed hub is polling it's INTERRUPT IN endpoint at the same time as the keyboard hub and keyboard.

To experiment, see the usbh_init_ep_qh() function in usbh.c the section that sets the S-Masks and C-Masks for non-BULK endpoints. As you can see from the code merged from the #47 work https://github.com/Bridgetek/ft9xx-sdk/pull/51, there is a wide band for the completion mask which follows the single start microframe. For high-speed it offset by one microframe for each "endpoint number" to space these out (e.g. ep2 uses microframe 1, ep3 microframe 2 etc). But for full-speed it always started at microframe 0 for low-speed at microframe 4.

The index is probably not ideal since each of the multiple devices in that configuration will have an endpoint 1. So multiple full-speed periodic endpoints (INTERRUPT) at the same polling interval will start at the same microframe.

In commit https://github.com/Bridgetek/ft9xx-sdk/commit/58002aa3315517eedf671dacde1a5bbd6529a1ee there is a change to the masks to attempt to have a different start position within the frame for each device. It also tries to reduce the overlap of the completion mask for full-speed devices. Background reading would be EHCI spec section 4.5 and 4.6!

A modified plan here would be to move every endpoint in each branch of the periodic frame list to a separate entry. This would need a significant change to the use of usbh_periodic_ep_list linked list and the usbh_update_periodic_tree() function.

Give it a shot and see if it helps.

NessDan commented 8 months ago

Unfortunately, the newest commit 58002aa didn't work for the USB 1.1 hubs, attached some failcases (it did succeed once.)

USB Captures: apple-usb1.1-hub-edgeguard-58002aa-intermittent-issue-failcases.zip

Also confirmed the same behavior exists with a USB 3.0 hub so I assume this exists for USB 2.0 hubs too

NessDan commented 8 months ago

Also if you want, I'd be more than happy shipping the keyboard + hubs to you if that makes debugging any easier! I'm also still happy testing out code changes like this!

brtchip-gdm commented 8 months ago

First can you tell me a bit about the hubs and devices connected please? If you could capture some info about the endpoints in the failing systems: endpoint number, type, polling interval, size etc that might help. Maybe just configuration descriptors if you can only capture these. A beagle trace of the enumeration process from the perspective of the root hub (i.e. the beagle connected to the FT900) should be enough.

NessDan commented 7 months ago

So for the USB 1.1 Hub is called "Targus PA055", I got this information from USB Device Tree Viewer for it here: usbtreeview_hub_1.1.txt

(One note: bInterval is 0xFF or 255ms?? That's pretty slow correct?)


The Apple keyboard with the integrated Hub is here too: usbtreeview_apple_keyboard_with_hub.txt


Let me know if you still need that Beagle trace, I can get it to you this week if needed!

Also if there happens to be some weirdness with this USB 1.1 Hub that doesn't exist for others, please let me know. I don't want you having to deal with weird hardware behaviors that other Hubs don't have.

brtchip-gdm commented 7 months ago

My USB 1.1 hub has a bInterval of 255 as well.

Let me know if you still need that Beagle trace, I can get it to you this week if needed!

Please do. My theory is to do with the amount of time it takes to have multiple interrupt transfers within a single frame on the periodic schedule.

NessDan commented 7 months ago

Sorry for such a late response, here are two captures of the 1.1 USB HUB, plugged into Windows. One of the files is plugging it in with nothing attached, and the other file is with a Dell USB 1.1 keyboard attached.

windows-usb-1.1-hub-1.1-with-and-without-devices.zip