genodelabs / genode

Genode OS Framework
https://genode.org/
Other
1.06k stars 250 forks source link

rpi: usb_hid test fails #3247

Closed skalk closed 3 years ago

skalk commented 5 years ago

Since the introduction of the automated USB HID tests on a variety of hardware, this test fails on Rasperry Pi 1. In general it seems to detect the HID devices correctly, and also the disconnects, but no PRESS and RELEASE events are shown.

ssumpf commented 5 years ago

@skalk: From our past experiences with this controller I would estimate that update_jiffies causes problems again. update_jiffies is called for each IRQ and in turn calls curr_time on a timer connection. If this is too slow, the behaviour you describe does occur. This is why we used a hardware timer in the past on Rpi. The new timer framework implementation solved the issue, because it is able to interpolate time and does not rely on RPC/system calls for each call to curr_time. Maybe something has changed there on ARM. For a quick check 47dc70888795d8798aa7cdefe8f008897bf49d1b could be reverted.

cproc commented 5 years ago

I tried to git bisect this issue some weeks ago and could get back as far as ca. 17.02. Before that, the usb_hid test did not start at all on rpi because of other errors. With this old version, the test showed input events, but already had dropouts. The dropout rate appeared to increase gradually until ca. 17.11, when no more input events were shown.

ssumpf commented 5 years ago

@cproc: I found this 685f509a4335a63680cd1ce7d508900d0c95ac08 commit which states there is no local time interpolation on ARM any more and was replaced by a syscall on hw. My suspicion rises. If there is time next week, I can check this. @nfeske: Is this issue important right now?

nfeske commented 5 years ago

Regardless of the practical use of the rpi, I think this issue is important, because it is a regression that shouldn't have slipped through in the first place. Furthermore, the rpi backend of the USB stack will become important once @tomga's work on the rpi3 comes to that stage. I would be very grateful if you take a close look, @ssumpf.

ssumpf commented 5 years ago

@nfeske: Will do.

tomga commented 5 years ago

I'm already trying to make usb work but was stopped by issues @irgendwie reported in mailing list year ago in "usb_drv error -92 for RPi". In consequence I'm trying to port dwc2 driver from linux.

Nevertheless it will be much better to have a reference implementation working for comparison. Probably I'll try to run current version on rpi zero if I need to compare behavior differences.

ssumpf commented 5 years ago

Okay, I started some investigation: As @cproc stated it is not a single commit but a possible series of degenerating changes. Between 17.02 and 17.05. the driver starts losing keys. Between 17.05 and 17.08 this becomes so severe that there are no or just 1 of 6 key events. I was able to track this down to commit 47dc70888795d8798aa7cdefe8f008897bf49d1b and will have a look at the 17.02 range next. I do observe enough interrupts though.

ssumpf commented 5 years ago

83a88d46b7d97e86bb038c981bc9a48f2097edfe breaks hid on Rpi with 17.11 completely.

skalk commented 5 years ago

@ssumpf I expect this goes together whith what you mentioned before about using the timer service instead of a local clock? Because last commit you mentioned indeed will increase the burden of a mad-going timer-service (all signals are processed that are available). Otherwise, I do not see how that commit relates to usb_hid not working anymore.

ssumpf commented 5 years ago

@skalk: Me neither. The only thing it leads to is that if there is a scheduled first timeout that has not triggered, it gets postponed in case it is overwritten by a following call to _program_first_timer. It seems as if any changes to the timing behaviour can break the host controller driver and in turn the hid one on Rpi. I really would like to know why, it does not make much sense right now. Also, I cannot grasp why 83a88d4 fixed usb_net on arndale since it just postpones the first timeout in case it has not triggered, no losing. Are we fixing a symptom and not the cause here?

skalk commented 5 years ago

@ssumpf: To be honest I cannot re-assemble in my head why commit 83a88d4 did solve problems in hw/arndale. To my shame I did not explained the erroneous behaviour in detail in some issue, and now that information got lost (as long as @chelmuth cannot recapture it).

ssumpf commented 5 years ago

@skalk: Don't worry, https://github.com/genodelabs/genode/commit/83a88d46b7d97e86bb038c981bc9a48f2097edfe is not the root cause for the USB problems on Rpi, but a symptom that something goes wrong there. I hope I can come up with a good solution soon.

chelmuth commented 5 years ago

To my shame I did not explained the erroneous behaviour in detail in some issue, and now that information got lost (as long as @chelmuth cannot recapture it).

I can remember that we analyzed the code thoroughly and identified an execution where (because of the programmed flag) a timeout was lost because it was not executed and also not reprogrammed.

skalk commented 5 years ago

Ok, to sum it up. We had a performance optimization for lots of timeouts in dde_linux, which did not work correctly in all cases. But removal of the optimization finally killed the high-load, shaky usb-driver for rpi that already flood the cpu with many interrupts because of micro-frame scheduling in software?! Given that this analysis is correct, the fix would be to performance-analyze and optimize for this platform again, but this time sematically correct.

ssumpf commented 5 years ago

I finally was able to track down the correct interrupt handler and with that the code that is executed too late or sometimes just after device disconnects - meaning we receive IRQ XFER URB completions too late - which suggests something is starving within the USB driver on rpi. It does not seems to be the IRQ task. I will investigate further. Thanks to @cproc for the excellent Xinu link (https://embedded-xinu.readthedocs.io/en/latest/arm/rpi/Synopsys-USB-Controller.html) and their driver implementation. This makes things easier to understand and the quote from Kroah-Hartman tells its own story there.

skalk commented 5 years ago

Some insights regarding platform performance. We run with the same cpu speed like Linux (~220 BogoMIPS), but when comparing strictly byte-wise memcpy, Linux achieves nearly twice as much to copy in the same time like our kernel or Fiasco.OC:

linux hw foc
83 MB/s 49 MB/s 49 MB/s
ssumpf commented 5 years ago

I implemented execution_time on hw in order to use the top tool. On the average the USB driver consumes ~40% (35% ep and ~5% signal proxy) CPU time, the timer ~10%, and about 40% idle thread time. Even though this is a lot, it still seems to lead to a latency issue.

nfeske commented 5 years ago

@ssumpf The 40% for the driver looks quite high. If I remember correctly, when executing the driver w/o the IRQ filtering in the kernel, the driver took about half of the CPU time. The filtering mechanism lowered the load to around 20% or less.

ssumpf commented 5 years ago

@nfeske: That is what I thought too. I think I found a bottle neck: The driver calls Timer::Connection::trigger_once about a thousand times per second. This seems to be caused by a self reprogramming timer.

tomga commented 5 years ago

I've ported dwc2 driver and have a working usb_hid_raw test (with some problem with keyboard that stops responding after few minutes but disconnecting a cable and connecting make it work again - I didn't try to find out what is the problem). Currently I only work to "just make it work" on rpi3bplus without trying to integrate without breaking other configurations but maybe you'd want to take this driver instead of the old one that is currently used.

I don't know if it would help for rpi1 - maybe it works for me only because rpi3 is quicker but what I see is superior is that I didn't have to make any changes in the driver code itself - only lx_emul changes (probably also should be cleaned before integrating). There is no need for special repositories with driver code which are currently used (unless I'm mistaken and haven't completely stopped using it).

tomga commented 5 years ago

It seems that my attempt to use dwc2 driver was not a good route. I found information here that fact that dwc_otg driver is not in linux sources is not due to the fact that dwc2 replaced it. dwc_otg is a driver which is used in rpi specific kernel and due to its use of fiq (I think) is not going to be merged to official linux kernel and dwc2 is generally slower but can - surprisingly - work in otg mode in contrast to dwc_otg which supports only host mode.

With that knowledge I'll try to use existing current port for rpi2 and rpi3_32 too.

However I have a questions about patches from this issue - can/will they be merged to staging/master? Is the revert in 98cff22b702c54a549432a896ae6485c6d702f88 is a good final solution or should it be reimplemented somehow?

tomga commented 4 years ago

I'd like to share current state of my attemts to make usb support on rpi (and not only usb) and I'd like to ask for some help/guidance. Current code is in rpi_staging_20200321.

Generally, like I wrote previously I returned to attempts to use driver from rpi specific kernel along with Genode specific sof handling optimization. There are some statistics printed about sof interrupts handled and it seems that this part is ok.

I've made some fixes to Sp804 timer to match exactly externally measured time and I also added a bcm2835 based timer for comparison (as I read somewhere that Sp804 timer frequency can change if ARM freqency is changed). Those changes did not fix the issue.

To be able to use one usb driver on multiple rpi versions I introduced passing rpi version passing in configurations. Depending on rpi version different base addresses for dwc controller are selected. It's definitely something that should be changed after introducing platform driver but currently does what is needed.

Current results when running usb_hid_raw (with leds blinking frequency increased to be able to observe problem more quickly) are:

@skalk, @ssumpf if you have any ideas then I'd like to hear them.

I have problems with investigating this more because enabling logs in the driver changes the behavior. I'll probably ask some more general question about gathering logs without overhead in such situations on mailing list.

I know that rpi platforms are not your primary target but without your help I'll probably give up. I hoped that we can try to fix this issue during this year's hack'n'hike but given current situation I doubt we'll be able to meet in May. Without making this usb driver working all rpi platforms are not really usable at all but on the other hand if this issue is resolved it seems that many different scenarios will become possible.

skalk commented 4 years ago

@tomga I've spent considerable time in debugging why our workload on rpi 1 is performing so badly within the last week. Mostly, because I was acting like I was born yesterday when it came to micro-benchmarking ;-). Anyway, the first obvious and big improvement when comparing to Linux execution times was enabling of the branch predictor on the rpi 1 which was off (don't ask me why) , it seems to be off on this platform since it entered Genode). Another, important step to reduce interrupt latency is the introduction of cpu quota to be consumed by the usb driver. See commits 9695788 and 88a24a0. Furthermore, I've found that interrupt latency got worse when interrupt sharing in the platform driver for x86 got introduced for base-hw, because every interrupt acknowledgement needs to be acked at the interrupt source (platform driver in that case, core in case of ARM), which results in some overhead. I've build a work-in-progress solution which enhances the situation, but will immediately open up an issue and provide a sane solution. When putting together all pieces, my performance adaptions for base-hw on rpi1 together with the interim fixes of @ssumpf (regarding the timer in the dde_linux usb driver) I measure idle times of 60-70% when two USB HID devices (mouse and keyboard) are firing events concurrently. So there is no platform issue anymore, meaning there is enough CPU time left. Unfortunately, the usb driver still misses some of the keyboard or mouse events :-(.

I see three things that have to be done next:

skalk commented 4 years ago

@chelmuth @nfeske commits 9695788 and 88a24a0 are ready for staging if time and QA permits it ;-)

nfeske commented 4 years ago

Thanks @skalk for the notice. I merged both commits to staging.

tomga commented 4 years ago

@skalk it's great you found some important problems. I'll try to check results as soon as possible. Probably in July.

However I have some things I wan't to ask now.

Firstly my results on rpi1 were different - I did not receive any events from keyboard or mouse. Neither on original sources from master (current when I tested) nor with my changes. What is the exact revision of rpi 1 you have? I have Raspberry Pi Model B+ v1.2.

Secondly I'd like to what do you use to emulate keyboard and mouse. I assume you have something to generate key presses in an automated way.

skalk commented 4 years ago

Firstly my results on rpi1 were different - I did not receive any events from keyboard or mouse. Neither on original sources from master (current when I tested) nor with my changes. What is the exact revision of rpi 1 you have? I have Raspberry Pi Model B+ v1.2.

Before applying the changes I did not see any input event too. We use Raspberry Pi Model B 1.2 - so not the "plus" model.

Secondly I'd like to what do you use to emulate keyboard and mouse. I assume you have something to generate key presses in an automated way.

Indeed we use an automated way. It is some Arduino-based microcontroller board called "Leonardo" that is programmed to repeatedly showing up two input devices, which trigger a fixed event sequence, and after that disappear again.

tomga commented 4 years ago

Before applying the changes I did not see any input event too. We use Raspberry Pi Model B 1.2 - so not the "plus" model.

So it seems I misunderstood something. Probably it's better than I thought.

Indeed we use an automated way. It is some Arduino-based microcontroller board called "Leonardo" that is programmed to repeatedly showing up two input devices, which trigger a fixed event sequence, and after that disappear again.

Do you have code for it published somewhere? I probably have Leonardo and I could use it to fully automate testing at least during development.

cproc commented 4 years ago

Do you have code for it published somewhere? I probably have Leonardo and I could use it to fully automate testing at least during development.

There's some documentation about the setup in the usb_hid test run scripts: https://github.com/genodelabs/genode/blob/master/repos/dde_linux/run/usb_hid_raw.run

tomga commented 4 years ago

@cproc thanks. I haven't noticed those instructions.

tomga commented 4 years ago

@skalk I checked results of your patches on my branch and have the same behavior to this observed by you - properly enumerating devices and later missing some random events. Previously I tested with normal mouse and keyboard so observing missing events was not that easy - the main observation was that after some time keyboard or mouse stopped responding. Now I bought same micro controller you use and it I can reproduce exactly the same wrong results.

I checked on rpi1 and rpi3 in 32bit mode and events were randomly lost in both cases so taking into account that rpi3 is quicker I would suspect that the problem is not related to missing processing power.

Given the possible roadmap on this issue you described I think:

  • @ssumpf patches regarding the usage of the timeout framework / timer session in dde_linux have to be consolidated and go to staging

If I'm not mistaken I think you are referring to dde36cf as other changes introduced debug (top) and brought back usage of local timer. In my branch on which I tested I have those changes applied with some minor calculation fixes that I remember made working this timer exaclty like time on reference linux machine.

  • afterwards the timeout framework has to be checked again for correctness and neglectable overhead

It is probably important although probably it will not help with this issue as on my branch local hardware timer is used (actually I used two for comparison) and problem can be observed.

  • when events are still missing we have to dive deeply into the usb dwc_otg driver to find the reason for the missed events

If I had to guess I would say that problem is somewhere here. Maybe something during a kernel version upgrade? I did not dive into port of this driver yet (my previous attempts were with another driver) and probably I will not be able to do this in near future.

@skalk I know you're on vacations and have probably other more important things to do after return but in case you find something I'll gladly test it and in case of success work on integrating other changes I have that are blocked by this issue.

cproc commented 3 years ago

I created a new test program for the Leonardo microcontroller in commit [1] to make it easier to debug this problem. With this program, the microcontroller generates one initial 'no key pressed' report (done automatically by the LUFA library) and ~5 seconds later 1000 'key pressed' reports for keys 0..9 (looping, with implicit key release of the previously reported key), one report per millisecond. Finally, another 'no key pressed' report is generated automatically by the LUFA library. So, all in all, 1002 key reports are generated by the microcontroller, which should result in 2000 key events (#0..#1999) reported by the Genode input test program.

When I tried it out with the 'usb_hid_raw' test on the Raspberry Pi, less than 2000 key events were reported most of the time and basically the reason for that turned out to be that split interrupt transactions are very timing-sensitive, like if the 'complete-split' transaction is not started in the second microframe after 'start-split', the hub on the Raspberry Pi likely responds with NYET and a retry will be already too late and the response from the device will be already dropped by the hub.

Then I wondered how other operating systems handle these timing constraints on the Raspberry Pi and found out that FreeBSD and even Linux (in the FIQ handler code which is not executed on Genode) use a 'hack' [2][3] which declares the interrupt transfer as a control transfer to the host controller and thereby circumvents the strict timing constraints of split interrupt transactions. With this hack applied for Genode in commit [4] and an additional NYET error handling commit [5], I finally saw the 2000 expected key events in multiple test runs on Raspberry Pi 1B+ and Raspberry Pi 3B+ so far. My current test branch [6] is based on @tomga's 'rpi_staging_20200321' branch for the Raspberry Pi 3 and trace log support.

[1] https://github.com/cproc/lufa/commit/e0cf5f23e261d85e3ce4a75ac89ac60072112672 [2] https://github.com/freebsd/freebsd/commit/092c9453db9993bdfd7ef9d1bb4db74b01820d9c [3] https://github.com/raspberrypi/linux/commit/102c7fde61a329cd26fbb50c78be919c5cfc8727 [4] https://github.com/cproc/dwc_otg/commit/7206ec29c70f793eb73de4bd8e38324cf9239530 [5] https://github.com/cproc/dwc_otg/commit/13a1464ce636eba082ef89d43e539e7e39048493 [6] https://github.com/cproc/genode/tree/rpi_staging_20200321

tomga commented 3 years ago

@cproc your findings and fixes are so unexpected and so great that it is hard for me to proper words to express my feelings. Thank you for this work.

That issue blocked me for so long with my RPI Genode work and now again seem to be possible to make progress again. I'm really excited to get back to it. Hopefully this will end problems with usb support on RPI.

ssumpf commented 3 years ago

@cproc: Wow!

skalk commented 3 years ago

@cproc it's such a great analysis! Respect!!

tomga commented 3 years ago

@cproc, thank you again but ... I'd like to ask you for more :-)

First, you mentioned testing on Raspberry Pi 3B+ but you did not say if it was rpi3 or my rpi3_32. Which was it? I assume that it was the latter as I still get compiler error in 64 bit compilation.

I'd also like to ask you if you tested this version with your "standard" software on ProMicro for test usb_hid_raw. I tried it and I could observe that events are not lost (complete groups of six events are coming) but on Raspberry Pi 1 it works few (I observed maximum 13) times until its stops and on Raspberry Pi 3B it works only once (tried few times and it did not change). Relevant logs in case of rpi3 contain only:

[init -> trace_logger] dev_info: input: USB HID v1.11 Keyboard [HID 03eb:204d] on usb-usbbus-/input0
[init -> trace_logger] dev_info: input: USB HID v1.11 Mouse [HID 03eb:204d] on usb-usbbus-/input1
[init -> trace_logger] dev_info: DWC OTG Controller
[init -> trace_logger] dev_info: new USB bus registered, assigned bus number 1
[init -> trace_logger] dev_info: irq 17, io mem 0x00000000
[init -> trace_logger] dev_info: USB hub found
[init -> trace_logger] dev_info: 1 port detected
[init -> trace_logger] dev_info: new high-speed USB device number 2 using dwc_otg
[init -> trace_logger] dev_info: USB hub found
[init -> trace_logger] dev_info: 4 ports detected
[init -> trace_logger] dev_info: new high-speed USB device number 3 using dwc_otg
[init -> trace_logger] dev_info: USB hub found
[init -> trace_logger] dev_info: 3 ports detected
[init -> trace_logger] dev_info: new high-speed USB device number 4 using dwc_otg
[init -> trace_logger] dev_info: new full-speed USB device number 5 using dwc_otg
[init -> test-input] Input event #0 PRESS KEY_X 65534   key count: 1
[init -> test-input] Input event #1 RELEASE KEY_X   key count: 0
[init -> test-input] Input event #2 PRESS BTN_LEFT 65534    key count: 1
[init -> test-input] Input event #3 REL_MOTION -1+0 key count: 1
[init -> test-input] Input event #4 REL_MOTION +0+1 key count: 1
[init -> test-input] Input event #5 RELEASE BTN_LEFT    key count: 0
[init -> trace_logger] dev_info: USB disconnect, device number 5
[init -> trace_logger] dev_info: new full-speed USB device number 6 using dwc_otg

And final request if you'd be willing to look at it once again? If it is too much I'll understand and will attempt to find a solution myself later. Especially after receiving help from you and rest of the team, I wouldn't like it to be lost, but given my minimal knowledge about USB I'm definitely on worse starting position.

cproc commented 3 years ago

@tomga: I tried the original test case again and can confirm that there are still problems after device reconnection. I'll have a look at it.

The reason for the 64-bit compile error is a mismatching include path in target.mk:

INC_DIR += $(REP_DIR)/src/include/spec/arm

This needs to be 'arm_64' for 64-bit compilation. When I changed it temporarily, the driver got built, but later an error message about missing platform driver occurred and I stayed with rpi3_32 so far.

cproc commented 3 years ago

@tomga: I created a new branch [1] with the latest fixes related to this problem. Feel free to give it a try.

[1] https://github.com/cproc/genode/tree/rpi_staging_20200321-2

tomga commented 3 years ago

@cproc I'm running now in parallel this usb_hid_raw test on rpi1 and rpi3_32 and comparing output from them regarding logs with input events. After more than 300 iterations of device reconnects output is exactly the same on both so it seems that after your changes this code works reliably and you fixed all important issues with it.

I briefly looked over the changes you made to make it work and I have to admit that I wouldn't be able to dive into this problem so deeply to solve this issue. So again, big thanks for you (and others that made earlier fixes) for this work.

I'll work on moving those changes to current staging, probably after 21.02 release. I'll inform about results here.

nfeske commented 3 years ago

The usb_hid_raw test succeeds. The old usb_hid test is scheduled for removal.

cproc commented 3 years ago

There are still some commits referenced in this issue which are not on staging yet, but might fix issues or improve performance:

@ssumpf: do you consider these commits ready for staging?

ssumpf commented 3 years ago

@cproc: I would not revive the hardware timer (a16331c, b14a60e). dde36cf should bring something, because we program less timeouts.