arduino-libraries / Arduino_USBHostMbed5

Apache License 2.0
4 stars 10 forks source link

GIGA R1 USB Mass Storage Directory example causes MbedOS crash #29

Open mjs513 opened 10 months ago

mjs513 commented 10 months ago

Reference: https://forum.arduino.cc/t/giga-r1-usb-mass-storage-examples-has-anyone-managed-to-run-an-example-project/1127694

Several of us (@BobtheDog, @Kurte and me) have been experimenting with the USBHost Mass Storage Example:

#include <DigitalOut.h>
#include <FATFileSystem.h>
#include <Arduino_USBHostMbed5.h>

USBHostMSD msd;
mbed::FATFileSystem usb("usb");

void setup()
{
    Serial.begin(115200);

    pinMode(PA_15, OUTPUT); //enable the USB-A port
    digitalWrite(PA_15, HIGH);

    while (!Serial)
        ;

    Serial.println("Starting USB Dir List example...");

    // if you are using a Max Carrier uncomment the following line
    // start_hub();

    while (!msd.connect()) {
        //while (!port.connected()) {
        delay(1000);
    }

    Serial.print("Mounting USB device... ");
    int err = usb.mount(&msd);
    if (err) {
        Serial.print("Error mounting USB device ");
        Serial.println(err);
        while (1);
    }
    Serial.println("done.");

    char buf[256];

    // Display the root directory
    Serial.print("Opening the root directory... ");
    DIR* d = opendir("/usb/");
    Serial.println(!d ? "Fail :(" : "Done");
    if (!d) {
        snprintf(buf, sizeof(buf), "error: %s (%d)\r\n", strerror(errno), -errno);
        Serial.print(buf);
    }
    Serial.println("done.");

    Serial.println("Root directory:");
    unsigned int count { 0 };
    while (true) {
        struct dirent* e = readdir(d);
        if (!e) {
            break;
        }
        count++;
        snprintf(buf, sizeof(buf), "    %s\r\n", e->d_name);
        Serial.print(buf);
    }
    Serial.print(count);
    Serial.println(" files found!");

    snprintf(buf, sizeof(buf), "Closing the root directory... ");
    Serial.print(buf);
    fflush(stdout);
    err = closedir(d);
    snprintf(buf, sizeof(buf), "%s\r\n", (err < 0 ? "Fail :(" : "OK"));
    Serial.print(buf);
    if (err < 0) {
        snprintf(buf, sizeof(buf), "error: %s (%d)\r\n", strerror(errno), -errno);
        Serial.print(buf);
    }
}

void loop()
{
}

So far as a Bottom Line Up Front:

Windows 10 with 2.2.1 IDE : Red lights of doom. Windows 10 with 1.8.19 IDE : Works.

Windows 11 with 2.2.1 IDE : Red lights of doom. Windows 11 with 1.8.19 IDE : Works.

Linux: MacOsX and RPI4 IDE 2.2.1 works

IDE 2.2.1 on Windows 11 Using Serial: Red lights of doom. Using Serial1: Works

Example works if using release 0.0.3

Posts 31, 37, and 38 show and discuss debug outputs.

Believe in all cases with we are using GIGA version 4.0.8 (installed via boards manager).

As for memory sticks a number have been tested: I have used an old SansDisk Cruzer 2GB and a generic Microcenter 64Gb.

KurtE commented 10 months ago

@facchinm - As I mentioned on the forum thread: https://forum.arduino.cc/t/giga-r1-usb-mass-storage-examples-has-anyone-managed-to-run-an-example-project/1127694/44?u=kurte

When I removed the three lines that were added 7 months ago:

#undef MBED_CONF_PLATFORM_CALLBACK_NONTRIVIAL
#define MBED_CONF_PLATFORM_CALLBACK_NONTRIVIAL  0
#include "Callback.h"

https://github.com/arduino-libraries/Arduino_USBHostMbed5/blame/main/src/USBHost/USBHostConf.h#L20

I was able to complete the directory listing of an USB Stick without crashing.

Note in #30 I mentioned that there are a ton of compiler warnings with this library of redefining: #define MBED_CONF_PLATFORM_CALLBACK_NONTRIVIAL 0 to `#define MBED_CONF_PLATFORM_CALLBACK_NONTRIVIAL 1

And if my debug session was correct it crashed at:

<_ZZN4mbed8CallbackIFvvEE8generateIZNS2_C4IPN7arduino6USBCDCEMS6_FvvELi0EEET_T0_EUlvE_vEEvOSA_E3ops>:
 806244c:   08043cbd    stmdaeq r4, {r0, r2, r3, r4, r5, r7, sl, fp, ip, sp}

Not sure if I just am treating the symptom or if this is the real issue.

AndrewCapon commented 10 months ago

there is some more debug info here:

(gdb) c
Continuing.

Breakpoint 2, USBDevice::in (this=0x24001f08 <PluggableUSBD()::obj>, endpoint=<optimized out>) at ./mbed-os/drivers/usb/source/USBDevice.cpp:1009
1009        endpoint_info_t *info = &_endpoint_info[EP_TO_INDEX(endpoint)];
(gdb) s
1011        if (info->pending >= 1) {
(gdb) s
1012          info->pending -= 1;
(gdb) s
1013            if (info->callback) {
(gdb) s
221             return _ops;
(gdb) s
1014              info->callback();
(gdb) s
mbed::Callback<void ()>::operator()() const (this=0x24001f3c <PluggableUSBD()::obj+52>)
    at ./mbed-os/platform/include/platform/Callback.h:513
513         R operator()(ArgTs... args) const
(gdb) s
mbed::Callback<void ()>::call() const (this=0x24001f3c <PluggableUSBD()::obj+52>) at C:\Users\andre\AppData\Local\Arduino15\packages\arduino\hardware\mbed_giga\4.0.8\cores\arduino/mbed/platform/include/platform/Callback.h:506
506             MBED_ASSERT(bool(*this));
(gdb) s
mbed::Callback<void ()>::operator bool() const (this=0x24001f3c <PluggableUSBD()::obj+52>)
    at C:\Users\andre\AppData\Local\Arduino15\packages\arduino\hardware\mbed_giga\4.0.8\cores\arduino/mbed/platform/include/platform/Callback.h:522
522             return control();
(gdb) s
mbed::Callback<void ()>::call() const (this=0x24001f3c <PluggableUSBD()::obj+52>) at C:\Users\andre\AppData\Local\Arduino15\packages\arduino\hardware\mbed_giga\4.0.8\cores\arduino/mbed/platform/include/platform/Callback.h:504
504         R call(ArgTs... args) const
(gdb) s
506             MBED_ASSERT(bool(*this));
(gdb) s
509         }
(gdb) s
508             return op_call(this, args...);
(gdb) s
0x08062370 in mbed::Callback<void ()>::generate<mbed::Callback<void ()>::generate<arduino::USBCDC*, void (arduino::USBCDC::*)(), 0>(arduino::USBCDC*, void (arduino::USBCDC::*)())::{lambda()#1}, void>(mbed::Callback<void ()>::generate<arduino::USBCDC*, void (arduino::USBCDC::*)(), 0>(arduino::USBCDC*, void (arduino::USBCDC::*)())::{lambda()#1}&&)::ops ()
(gdb) s
Single stepping until exit from function _ZZN4mbed8CallbackIFvvEE8generateIZNS2_C4IPN7arduino6USBCDCEMS6_FvvELi0EEET_T0_EUlvE_vEEvOSA_E3ops,
which has no line number information.
0x0806237c in cdc_line_coding_default ()
(gdb) l
503          */
504         R call(ArgTs... args) const
505         {
506             MBED_ASSERT(bool(*this));
507             auto op_call = reinterpret_cast<call_type *>(call_fn());
508             return op_call(this, args...);
509         }
510
511         /** Call the attached function
512          */
(gdb) s
Single stepping until exit from function _ZL23cdc_line_coding_default,
which has no line number information.
0x08062384 in rtos::Kernel::wait_for_u32_forever ()
(gdb) s
Single stepping until exit from function _ZN4rtos6KernelL20wait_for_u32_foreverE,
which has no line number information.
HardFault_Handler () at except.S:50
50      except.S: No such file or directory.
(gdb)
alrvid commented 10 months ago

Thank you for the reports! I've just spent the last few hours troubleshooting a crash for another user on the Portenta Machine Control. That turned out to happen at the exact same place the debug info from @AndrewCapon, which I noticed just now. I've been able to replicate that problem myself by running the user's binary, but when compiling the exact same sketch, using the exact same compiler version (as verified from the .elf file) and so on, I couldn't replicate the crash. The only difference I have seen so far is that he's using Windows and I'm using macOS. We're going to continue investigating this on Monday, and bring in someone from another team, because the crash seems to happen in code that they've worked on and not us. It could be that the changes to the Arduino_USBHostMbed5 library simply triggered some latent bug in that code, or it might be something else. We'll see what we can come up with. Thank you again for reporting this! And one more thing: it would be great if all of you could report which operating system you're compiling on. It might help us narrow this down.

AndrewCapon commented 10 months ago

The issue seems to be with Windows 10/11, IDE 2.2.1.

IDE 1.8.19 on windows 10/11 is fine.

Either IDE on Mac, Linux or RPI4 also fine.

KurtE commented 10 months ago

Windows 11 2.x nightly build from last night. 20231020

mjs513 commented 10 months ago

Windows 11, 2.2.1 official release.

RPI4 Ubuntu 22.04.03, ide 2.2.1 official release

alrvid commented 10 months ago

@facchinm - As I mentioned on the forum thread: https://forum.arduino.cc/t/giga-r1-usb-mass-storage-examples-has-anyone-managed-to-run-an-example-project/1127694/44?u=kurte

When I removed the three lines that were added 7 months ago:

#undef MBED_CONF_PLATFORM_CALLBACK_NONTRIVIAL
#define MBED_CONF_PLATFORM_CALLBACK_NONTRIVIAL  0
#include "Callback.h"

https://github.com/arduino-libraries/Arduino_USBHostMbed5/blame/main/src/USBHost/USBHostConf.h#L20

I was able to complete the directory listing of an USB Stick without crashing.

Note in #30 I mentioned that there are a ton of compiler warnings with this library of redefining: #define MBED_CONF_PLATFORM_CALLBACK_NONTRIVIAL 0to `#define MBED_CONF_PLATFORM_CALLBACK_NONTRIVIAL 1

And if my debug session was correct it crashed at:

<_ZZN4mbed8CallbackIFvvEE8generateIZNS2_C4IPN7arduino6USBCDCEMS6_FvvELi0EEET_T0_EUlvE_vEEvOSA_E3ops>:
 806244c: 08043cbd    stmdaeq r4, {r0, r2, r3, r4, r5, r7, sl, fp, ip, sp}

Not sure if I just am treating the symptom or if this is the real issue.

I've been troubleshooting this the whole day based on a binary from another user running the Portenta Machine Control. He's compiling on Windows and I'm compiling on macOS. His binary crashes and mine doesn't. I've been reversing his binary to figure out exactly where things go wrong initially - not where everything crashes, but where the first incorrect things show up. It's in a call that contains this:

::mbed::callback()

At the assembly code level, the problem is that there's an address hard-coded into the binary by the compiler at a specific other address at the end of a specific function. That hard-coded address is used much later to call another function, which is located at an even address. On my computer, the compiler puts an odd address into the binary, while in the user's problematic binary, it has put an even address in there. When the address is used for the call, there's a UsageFault because an even address signals to the CPU to go into Arm mode instead of Thumb mode. The question is why the compiler does this incorrectly on Windows but not on macOS. The Callback functionality in Mbed is pretty complicated to say the least, but when troubleshooting I've started suspecting the MBED_CONF_PLATFORM_CALLBACK_NONTRIVIAL setting, even though I'm not sure of the details.

Long story short: Your observation is very interesting, and fits well into my troubleshooting results so far. I'm going to have a closer look at it tomorrow. Please let me know if you have any further ideas along this track.

For now, I want you all to know that I'm still working actively on this.

alrvid commented 10 months ago

Does version 0.0.4 of the library work for any of you?

mjs513 commented 10 months ago

@alvid - I believe the reason everone is using 0.0.3 is as a result of this post: https://forum.arduino.cc/t/giga-r1-usb-mass-storage-examples-has-anyone-managed-to-run-an-example-project/1127694/3

[formulanay](https://forum.arduino.cc/u/formulanay)
[May 17](https://forum.arduino.cc/t/giga-r1-usb-mass-storage-examples-has-anyone-managed-to-run-an-example-project/1127694/4?u=merlin513)post #4
Thank you for your answer.

I finally was able to managed an example when downgrading Arduino_USBHostMbed5 from 0.04 to 0.03.

What could be the reason the example works with 0.03, but not with 0.04 ?

Thanks !
MoveElectricMobility commented 10 months ago

Only 0.0.3 works for us :(

AndrewCapon commented 10 months ago

I bet if you go back to the 1.18.9 IDE 0.0.4 will work.

alrvid commented 10 months ago

I have a suspicion that changes in 0.0.4 caused the bug and I'm working on an updated version of the library. At this point I've been able to revert what I think are the breaking changes without reintroducing the old bug they were meant to solve. It's still experimental and we're going to do some testing internally to see how well it works.

alrvid commented 10 months ago

Unfortunately, the updated version didn't work out in testing. I've been continuing working on this, and it seems that the root cause is something more fundamental than just a bug in this library. I now know about at least three, perhaps even four, bugs that seem to originate from the same underlying issue. I've seen that some of you have been able to work around this particular issue by removing some redefines from the library, and if that works for you it's good, but please keep in mind that the underlying problem is most likely still there, although temporarily hidden by the change. I'm still working on it, but it's a bit tricky to find the true cause and solve this once and for all, instead of just temporarily hiding the bug. I'm sorry for the inconvenience.

KurtE commented 10 months ago

@alrvid - as you mentioned, there could be additional issues.

But seeing all of the warnings about: MBED_CONF_PLATFORM_CALLBACK_NONTRIVIAL And having the code fault in what looks like a callback made me suspicious.

Especially when I undid that change and the system did not fault. But is that the cause or just a symptom? Not sure.
Could be a memory corruption issue, and having the setting or not having the setting, might cause which memory gets corrupted, where in some cases, it was not a critical memory location, and other cases it is.

Always hard to find those!

Good luck

AndrewCapon commented 10 months ago

I have been trying to get the MidiHost going, there are a multitude of issues but I can get it along to fail in the same way.

The interesting thing is that this is on a Mac and I do not see this issue in the Mass Storage example on the Mac!

Screenshot 2023-10-29 at 06 38 16
AndrewCapon commented 10 months ago

Removing the MBED_CONF_PLATFORM_CALLBACK_NONTRIVIAL=0 in USBEndPoint.h and USBHostConfig.h does NOT fix this issue here.

AndrewCapon commented 10 months ago

Ok, I found out whilst debugging that something else was setting MBED_CONF_PLATFORM_CALLBACK_NONTRIVIAL=0

If I force this to 1 in callback.h everything works.

#undef MBED_CONF_PLATFORM_CALLBACK_NONTRIVIAL
#define MBED_CONF_PLATFORM_CALLBACK_NONTRIVIAL  1
alrvid commented 10 months ago

The problem is for sure related to trivial and non-trivial callbacks. In the customer binary I've reversed, the problem seems to be a mixup between these two states. The compiler generates machine code to store a non-trivial callback from one part of the source code, while it generates machine code that looks like it's intended to execute a trivial callback from another part [edit: I've ruled that explanation out, it simply looks like incorrectly generated machine code instead of a mixup between the two types]. Then there's a crash because the binary tries to execute the address to the callback (data as code) instead of the callback itself. Which leads to a UsageFault, because the address stored is even instead of odd. Looking in the Mbed source code, there are several comments about compilers and other tools being mistaken by the complexity of the Callback implementation. So, possible explanations involve anything from a compiler bug, to undefined behavior in the Callback code that's hard to identify because of the complexity, to a mistake in the Arduino build system, to a mistake somewhere else. Or a combination of these. I don't like problems that go away inexplicably when making changes, so I'm really trying to get to the bottom of this once and for all. But it might take some time.

mjs513 commented 10 months ago

@alrvid Your explanation makes perfect sense to me. Playing with a Gamepad implementation (shown on the forum) its seems that we can connect with no issue but the callback is failing when I go do:

gamepad.attachEvent(onGamepadEvent);

depending on which version of the library I use it will just hang and not go any farther than connecting or will get a the blinking red lights showing Mbedos crashed

MoveElectricMobility commented 10 months ago

STM32 Memory Management Unit Unauthorized Memory Address Access flag is set when callback memory pointer used, stack reveals a non sensical address is being pushed to the stack, is this an MBED OS issue or something going on with the arduino compiler stack?

alrvid commented 10 months ago

When reversing a binary compiled on Windows, I've noticed that the code generated for these two calls is really strange:

PluggableUSBD().endpoint_add(_bulk_in, CDC_MAX_PACKET_SIZE, USB_EP_TYPE_BULK, ::mbed::callback(this, &USBCDC::_send_isr));
        PluggableUSBD().endpoint_add(_bulk_out, CDC_MAX_PACKET_SIZE, USB_EP_TYPE_BULK, ::mbed::callback(this, &USBCDC::_receive_isr))

It looks like the "this" pointer becomes a null pointer, and the return value of ::mbed::callback() seems to become the "this" pointer instead. The resulting object from ::mbed::callback is simply hard-coded into the binary but never correctly passed to endpoint_add(). When compiling on macOS, there are instead two calls to a separate Callback constructor. So the compiler generates quite different code on the two platforms in this case. The question is why.

I've compared the compiler flags on Windows and on macOS, without finding any differences. I've also checked that the MBED_CONF_PLATFORM_CALLBACK_NONTRIVIAL setting when this code runs is in fact the expected one, and as far as I can tell it is, on both Windows and macOS.

The only difference I've noticed so far is that the g++ compiler itself is compiled with different compiler versions for Windows, macOS, and Linux. If perhaps that matters somehow.

The code that implements the Callback functionality is pretty complicated, but I think I understand how it's supposed to work in this particular case, and I can't see anything obviously wrong with it. But it might be something not so obvious.

alrvid commented 10 months ago

STM32 Memory Management Unit Unauthorized Memory Address Access flag is set when callback memory pointer used, stack reveals a non sensical address is being pushed to the stack, is this an MBED OS issue or something going on with the arduino compiler stack?

Thank you for your observation. I'm beginning to suspect either some kind of undefined behavior in Mbed's Callback implementation, or some kind of compiler bug in the one used on Windows.

alrvid commented 10 months ago

I've tracked the bug further and I think the location is either here:

https://github.com/ARMmbed/mbed-os/blob/7c7d20da6527885237094d9d50ce099404414201/platform/include/platform/Callback.h#L642C9-L642C41

Or here:

https://github.com/ARMmbed/mbed-os/blob/7c7d20da6527885237094d9d50ce099404414201/platform/include/platform/Callback.h#L714

The reason I think so is the zeroing out of the "this" pointer, which I think happens because of some weird thing happening in one of these two locations.

I've also found an old forum post by the author of the code in question:

https://answers.launchpad.net/gcc-arm-embedded/+question/686870

He seems to have had problems with this exact part of the code about three years ago. I haven't had the time to study his post in detail, but I will tomorrow.

alrvid commented 9 months ago

I think I have an explanation for why the redefines in Arduino_USBHostMbed5 causes crashes on Windows. They break paragraph 3.2.4 of the C++14 specification (a subcategory of the ODR - One Definition Rule) in an insidious way. There's no diagnostic required in these cases, which means no warnings, no errors required by the compiler. It's also undefined behavior. There's a conditional compilation of the function call_fn() in the Callback implementation, and in practice, it's inlined in the call() function in the same object file. Depending on the MBED_CONF_PLATFORM_CALLBACK_NONTRIVIAL setting you get an extra "ldr r3,[r3, #0]" instruction or not inlined in call(). Without that indirection, calling a nontrivial callback leads to the execution of "bx r3" where r3 contains the address of the ops structure (dynamically dispatched operations), instead of the address contained in the first member of that structure, which would lead to the type-erased function pointer. And because the structure is at an even address, there's a UsageFault when the CPU tries to enter Arm mode, which isn't supported on Cortex-M. When there's a single definition of MBED_CONF_PLATFORM_CALLBACK_NONTRIVIAL everything is fine, because there's only one definition of call_fn() and call() - the last function of the two is the only one that exists in practice, in the binary. But when you redefine MBED_CONF_PLATFORM_CALLBACK_NONTRIVIAL you get two definitions of the same call() function at the same time, breaking paragraph 3.2.4. And remember, no diagnostic required, and this is undefined behavior. So anything goes. On macOS, USBDevice::endpoint_add() is then linked against call() from USBHost.o, which contains the extra "ldr r3,[r3, #0]" instruction. On Windows, it's instead linked against the call() function from USBHostSerial.o, which contains no indirection, and it leads to a crash. These two translation units are compiled with different settings because the redefines are applied to some parts of the library but not all of them. Remember, breaking 3.2.4 is undefined behavior, so it's ok to link this at random. But keeping them equal wouldn't be ok anyway because 3.2.4 would still be broken by the library in combination with the core. I'm also working on some changes that should improve other parts of the Callback implementation if they work out. I'll be back with an update when I have something new to share.

AndrewCapon commented 9 months ago

Nice work

alrvid commented 9 months ago

Removing all the redefines revealed yet another problem: parts of the library doesn't know about the default Mbed configuration. This is why @KurtE had success with removing these lines:

undef MBED_CONF_PLATFORM_CALLBACK_NONTRIVIAL

define MBED_CONF_PLATFORM_CALLBACK_NONTRIVIAL 0

include "Callback.h"

from USBHostConf.h. It's because USBHostSerial.h includes USBHostConf.h, and that file includes Arduino.h, which includes other headers, which at one point includes the Mbed configuration file. And when compiling on Windows, the call() function I mentioned in my previous comment comes from USBHostSerial.o. USBHostSerial.h includes USBHostConf.h before it includes Callback.h, so the default Callback configuration is used, and everything works fine.

But, if only the redefines are removed, Callback.h is included already in USBHostConf.h, but without any default Mbed configuration, so we get the trivial version of call(). Next, Arduino.h is included, which includes the Mbed configuration indirectly, but the damage has already been done. Over in USBHostSerial.h, USBHostConf.h is included. After that, Callback.h is included in USBHostSerial.h, and we have the correct Mbed configuration in place, but since it's already been included once, the correct version of call() is never actually used. So we get a crash.

Unfortunately, there are more problems, so I have to continue troubleshooting this even further.

alrvid commented 9 months ago

Two of the fixes are in the new version (0.3.1) of the library. There will also be a couple of additional fixes in the next version of the Mbed core, but that one isn't released yet. I think the library fixes will solve the majority of the Callback problems though. Please let me know if it works for you or not.

KurtE commented 9 months ago

This appears to work. It resolves the initial problem, where I needed to remove the issue when I unplugged a device.

38 - Which I will close out.

mjs513 commented 9 months ago

While @KurtE has been experimenting with USBHostSerial I have been experimenting with hooking up gamepads up via USBHost. Generally it has been working but note I am using @KurtE WIP branch of USBHostMbed5 which incorporates all the changes in 3.1 plus those that @AndrewCapon have mentioned in the open PR and his open issues on MIDI. Without them XBox and Switch controllers will not work.

As @KurtE mentioned it resolves the issue with unplugging a device. I can now capture that change with device connected and able to reconnect another device with having to restart the sketch.

@alrvid - any chance of getting those other changes incorporated as well.

AndrewCapon commented 9 months ago

I would be wary of the changes I made in https://github.com/arduino-libraries/Arduino_USBHostMbed5/pull/33

Even though the code seems to work, the PR was put in to try to generate some discussion about why the existing code was not working rather than a general fix.

I'm currently trying to work out some issues with the MBED Hub code, once I have done that I am going to set up a new branch with the original code and try to work out why it isn't working, I have a better understanding now of the surrounding code so maybe I can see the issue.

One major issue with the whole thing is the spamming of the USB Interrupts, when a USB transfer is requested the interrupt for that channel is enabled and you can get large numbers of interrupts calling HAL_HCD_HC_NotifyURBChange_Callback(), this is not good for CPU usage. One of the changes I made alleviates this problem a little by not then retrying the transfer on every NAK but still the issue is there.

If you then start to get multiple devices, each with their own channel interrupt everything can get totally overloaded.

On a transfer complete the handlers will then disable the channel interrupt and reschedule a new transfer which will then re-enable the channel interrupt, when everything is a bit overloaded there seems to be a timing issue where interrupts are lost and that's the end of that channel/device.

Also I am trying to track down an issue where the USB system just dies, not crash but just stops working...

KurtE commented 9 months ago

@AndrewCapon @alrvid - First it is great that you both are resolving some of the issues within the USB Host code. I am glad that hopefully the callback issues are resolved (🤞 )

I agree that the code in #33 is probably not the complete solution. But it might be a reasonable Band-Aid, until a more complete solution is implemented. To me, the currently released version of the library is at an alpha release quality, as far too many devices fail to even connect, as they fail to load either their device or configuration descriptor.

At least with these changes @mjs513 and myself have been able to implement and debug code for some additional devices.

I keep wondering why there is a different library (USBHostGIGA) for devices like Keyboards and Mice? The readme says:

This library has been created to overcome the limitations of https://github.com/arduino-libraries/Arduino_USBHostMbed5 when it's time to talk with USB Low Speed devices like keyboards and mice.

Were they running into the same issues? Are there some hints in that code base on how to solve some of these problems?

Again, Thank You.

AndrewCapon commented 9 months ago

I have spent a few hours with the existing code trying to nudge it into a working state.

Now this code must have worked on certain STM32 boards as someone has spent a lot of time on it, it does not work on the GIGA board though.

The first thing was to correct the spamming of re-requests on NAKs.

The second step was trying to work out what was going on with the multiple packet requests.

I could not get to the bottom of it, I have code in there that uses my logic analyser to log states and basically what I am seeing here is the first packet working, the second packet seeming to work as in the URB_STATE is correct but the actual RX fifo is not getting the data, then the third packet working.

So for 18 bytes:

One packet of 8 bytes on RX fifo Missing packet of 8 bytes even though the state of the channel show success, One packet of 2 bytes on RX fifo.

I just could not get it to work, I have no idea where this missing packet is going.

So I have started looking at a different solution (using the HAL for the entire transfer) where the HAL_HCD_HC_NotifyURBChange_Callback() is not used or is extremely minimal to avoid the CPU hit especially when using multiple devices at the same time, and to avoid starting transfers in this interrupt handler.

Instead I am using a 1ms interrupt to handle the states of the USB channels and the retry code will be in there.

Hopefully this will stop the USB system getting totally bogged down and falling apart when using hubs and will allow correct handling of re-requests based on actual time.

mjs513 commented 9 months ago

Wonder if that is one of the reasons why one of the joysticks I am testing with is always reading 0 bytes transfered even though it shows READ_SUCCESS.

AndrewCapon commented 9 months ago

Hi @KurtE and @mjs513

I have a version running on a 1ms timer which retries every 5ms: https://github.com/arduino-libraries/Arduino_USBHostMbed5/commit/57c06ca904ec748011f1cf33a876a2b450f026d2

ARC_TICKER_BASED needs to be set to 1 to enable it. ARC_USB_FULL_SIZE needs to also be set to 1.

This gets rid of HAL_HCD_HC_NotifyURBChange_Callback(), reduces the drain on cpu and gets rid of the re-transmit in the USB irq.

I have tested here with some memory sticks and midi devices without hubs and with hubs and it seems to work ok, maybe you guys can give it a test?

It unfortunately does not get rid of the issue where if using multiple devices on a hub they can stop working (bulk transfers stop working), it does keep them alive for longer so I might be on the right track...

AndrewCapon commented 9 months ago

I have been investigating the flood of interrupts, even though I have got rid of HAL_HCD_HC_NotifyURBChange_Callback() the ISR that called it is still using loads of cpu :(

I have found : https://community.st.com/t5/stm32-mcus-products/stm32f4-stm32f7-usb-host-core-interrupt-flood/td-p/436225

This I think is the problem I am seeing, there is a set of files that you can use to patch the code or the way out of this flooding is to enable the dma.

in `USBHALHost_STM.h" we have:

    hhcd->Init.dma_enable = 0; // for now failed with dma

If I enable this nothing works, I think the way forward is to investigate this and try to get it working...

KurtE commented 9 months ago

Hello @AndrewCapon @mjs513 and all

I added your fork to be a remote and synced up to your branch that is associated with the Pull request.

I edited USBHostConf.h to add:

// Andrews stuff
#define ARC_TICKER_BASED 1
#define ARC_USB_FULL_SIZE 1

Built the Host dumper sketch and was able to build it, but when it starts up it gives me the red flash sequence.

Looks like your sketch is about 10 commits behind the main branch so maybe it does not have the changes for callbacks?

WIll try again later, but will switch back to my WIP branch

AndrewCapon commented 9 months ago

Hi @KurtE,

mmm, weird.

I have now merged main into that PR.

What is the "Host Dumper Sketch", maybe I should try that here.

AndrewCapon commented 9 months ago

Concerning enabling DMA, first I though well just disable the DCache globally, that will fix it and then I can look into where the DCache was causing problems.

Disabling the DCache didn't get it working though which came as a bit of a surprise, I'll do some digging...

AndrewCapon commented 9 months ago

@KurtE & @mjs513

Some good news, so I gave up looking at the DMA approach and gave Lix Paulians fix here a go (near the bottom): https://community.st.com/t5/stm32-mcus-products/stm32f4-stm32f7-usb-host-core-interrupt-flood/td-p/436225/page/4

So, this is what we see using without the ticker based code normally:

Screenshot 2023-11-11 at 08 14 23

In the bottom bulk.ctrl you can see all the ISR calls, you can also see in lane2 the hub rxHandler having problems scheduling the next hub interrupt transfer and a missed transferCompleted in lane 1.

How with the fix we get:

Screenshot 2023-11-11 at 08 09 37

Now you can see in lane 4 the spamming has stopped, lane 2 shows a happy hub rxHandler, lane 1 shows the correct number of transfers.

Pretty good.

I'm going to do some testing and will come up with some new code, so hold off on looking at the ticker based stuff for a while...

AndrewCapon commented 9 months ago

Hi @KurtE @mjs513 @alrvid

There is a new PR: https://github.com/arduino-libraries/Arduino_USBHostMbed5/pull/42

This for me fixes all USB Host issues, tested with midi devices, MSD and Hubs.

Can you guys give it a go and see if it is working for your devices?

mjs513 commented 9 months ago

Morning @AndrewCapon - @KurtE Just using the PR 42 branch and at least for the joysticks/gamepads I had working before they still seem to be working with you updates in the PR.

Did change the number of interfaces to 4 in USBHost_conf.h to get one of the controllers to get recognized. Probably that should be by default

KurtE commented 9 months ago

Early Morning for me (about 7:00) @AndrewCapon @mjs513 and all...

I was about to say that it did not work, but then found that your fork/branch for the PR did not enable the fix. So far it appears to work. Tried plugging in FTDI device, plus USB Keyboard. Will try plugging in MINIMA and Teensy next.

The sketch I was going to report was the HID Object dumper. I posted output from it, plus a link to it in a new issue I created earlier this morning. Where this library does not appear to work for Low Speed devices.

https://github.com/arduino-libraries/Arduino_USBHostMbed5/issues/43

Edit: Minima worked:

Device:0x2400c714
VID: 2341, PID: 69
Manufacturer: Arduino
Product: UNO R4 Minima
Speed: 0
Size of configuration Descriptor: 93
Configuration Descriptor
2400E8D0 - 09 02 5D 00 03 01 00 C0  FA 08 0B 00 02 02 02 00  : ..]..... ........
2400E8E0 - 00 09 04 00 00 01 02 02  00 04 05 24 00 20 01 05  : ........ ...$. ..
2400E8F0 - 24 01 00 01 04 24 02 02  05 24 06 00 01 07 05 81  : $....$.. .$......
2400E900 - 03 08 00 10 09 04 01 00  02 0A 00 00 00 07 05 02  : ........ ........
2400E910 - 02 40 00 00 07 05 82 02  40 00 00 09 04 02 00 00  : .@...... @.......
2400E920 - FE 01 01 05 09 21 0D E8  03 00 10 01 01           : .....!.. .....
Config:
 wTotalLength: 93
 bNumInterfaces: 3
 bConfigurationValue: 1
 iConfiguration: 0
 bmAttributes: 192
 bMaxPower: 250
Unknown: 2400E8D9 - 08 0B 00 02 02 02 00 00                           : ........ 
****************************************
** Interface level **
  bInterfaceNumber: 0
  bAlternateSetting: 0
  Number of endpoints: 1
  bInterfaceClass: 2
  bInterfaceSubClass: 2
    Communications and CDC
  bInterfaceProtocol: 0
  iInterface: 4
Unknown: 2400E8EA - 05 24 00 20 01                                    : .$. .
Unknown: 2400E8EF - 05 24 01 00 01                                    : .$...
Unknown: 2400E8F4 - 04 24 02 02                                       : .$..
Unknown: 2400E8F8 - 05 24 06 00 01                                    : .$...
  Endpoint: 81 In
    Attrributes: 3 Interrupt
    Size: 8
    Interval: 16
****************************************
** Interface level **
  bInterfaceNumber: 1
  bAlternateSetting: 0
  Number of endpoints: 2
  bInterfaceClass: 10
  bInterfaceSubClass: 0
    CDC-Data
  bInterfaceProtocol: 0
  iInterface: 0
  Endpoint: 2 Out
    Attrributes: 2 Bulk
    Size: 64
    Interval: 0
  Endpoint: 82 In
    Attrributes: 2 Bulk
    Size: 64
    Interval: 0
****************************************
** Interface level **
  bInterfaceNumber: 2
  bAlternateSetting: 0
  Number of endpoints: 0
  bInterfaceClass: 254
  bInterfaceSubClass: 1
  bInterfaceProtocol: 1
  iInterface: 5
parseInterface nb:0
 bInterfaceClass = 2
 bInterfaceSubClass = 2
    Communications and CDC
 bProtocol = 0
useEndpoint(0, 3, 2)
parseInterface nb:1
 bInterfaceClass = 10
 bInterfaceSubClass = 0
    CDC-Data
 bProtocol = 0
parseInterface nb:2
 bInterfaceClass = 254
 bInterfaceSubClass = 1
 bProtocol = 1
New Debug device: VID:2341 PID:0069 [dev: 0x2400c714 - intf: 0]
USB host device(2341:69) connected
Manufacturer: Arduino
Product: UNO R4 Minima

Likewise Teensy: In this case a Micromod

Device:0x2400c714
VID: 16C0, PID: 483
Manufacturer: Teensyduino
Product: USB Serial
Speed: 0
Size of configuration Descriptor: 98
Configuration Descriptor
2400E8D0 - 09 02 62 00 03 01 00 C0  32 08 0B 00 02 02 02 01  : ..b..... 2.......
2400E8E0 - 00 09 04 00 00 01 02 02  01 00 05 24 00 10 01 05  : ........ ...$....
2400E8F0 - 24 01 01 01 04 24 02 06  05 24 06 00 01 07 05 82  : $....$.. .$......
2400E900 - 03 10 00 10 09 04 01 00  02 0A 00 00 00 07 05 03  : ........ ........
2400E910 - 02 40 00 00 07 05 84 02  40 00 00 09 04 02 00 02  : .@...... @.......
2400E920 - FF 6A FF 00 07 05 81 02  40 00 01 07 05 01 02 40  : .j...... @......@
2400E930 - 00 01                                             : ..
Config:
 wTotalLength: 98
 bNumInterfaces: 3
 bConfigurationValue: 1
 iConfiguration: 0
 bmAttributes: 192
 bMaxPower: 50
Unknown: 2400E8D9 - 08 0B 00 02 02 02 01 00                           : ........ 
****************************************
** Interface level **
  bInterfaceNumber: 0
  bAlternateSetting: 0
  Number of endpoints: 1
  bInterfaceClass: 2
  bInterfaceSubClass: 2
    Communications and CDC
  bInterfaceProtocol: 1
  iInterface: 0
Unknown: 2400E8EA - 05 24 00 10 01                                    : .$...
Unknown: 2400E8EF - 05 24 01 01 01                                    : .$...
Unknown: 2400E8F4 - 04 24 02 06                                       : .$..
Unknown: 2400E8F8 - 05 24 06 00 01                                    : .$...
  Endpoint: 82 In
    Attrributes: 3 Interrupt
    Size: 16
    Interval: 16
****************************************
** Interface level **
  bInterfaceNumber: 1
  bAlternateSetting: 0
  Number of endpoints: 2
  bInterfaceClass: 10
  bInterfaceSubClass: 0
    CDC-Data
  bInterfaceProtocol: 0
  iInterface: 0
  Endpoint: 3 Out
    Attrributes: 2 Bulk
    Size: 64
    Interval: 0
  Endpoint: 84 In
    Attrributes: 2 Bulk
    Size: 64
    Interval: 0
****************************************
** Interface level **
  bInterfaceNumber: 2
  bAlternateSetting: 0
  Number of endpoints: 2
  bInterfaceClass: 255
  bInterfaceSubClass: 106
  bInterfaceProtocol: 255
  iInterface: 0
  Endpoint: 81 In
    Attrributes: 2 Bulk
    Size: 64
    Interval: 1
  Endpoint: 1 Out
    Attrributes: 2 Bulk
    Size: 64
    Interval: 1
parseInterface nb:0
 bInterfaceClass = 2
 bInterfaceSubClass = 2
    Communications and CDC
 bProtocol = 1
useEndpoint(0, 3, 2)
parseInterface nb:1
 bInterfaceClass = 10
 bInterfaceSubClass = 0
    CDC-Data
 bProtocol = 0
parseInterface nb:2
 bInterfaceClass = 255
 bInterfaceSubClass = 106
 bProtocol = 255
New Debug device: VID:16c0 PID:0483 [dev: 0x2400c714 - intf: 0]
USB host device(16c0:483) connected
Manufacturer: Teensyduino
Product: USB Serial

Although I might want to add some additional information in the sketch for descriptor type 5...

AndrewCapon commented 9 months ago

Glad its working.

I have some multi device midi midi code running here for a few hours and everything still works, so it is looking good for me :)

I'll give that Object Dumper is look as well...

AndrewCapon commented 9 months ago

Hi @KurtE @mjs513

There was a howler due to transfer descriptors being reused in https://github.com/arduino-libraries/Arduino_USBHostMbed5/pull/42

I have added a new commit in to fix it, there is another commit that made its way into the PR as well, me and Git do not get on at all!