Ringbuffer refactoring, Linux support & quirk fixes

Hi! This PR is an upgraded version of PR https://github.com/Aharoni-Lab/Miniscope-DAQ-Cypress-firmware/pull/1 which was originally created due to discoveries made in https://github.com/Aharoni-Lab/Miniscope-DAQ-QT-Software/pull/4 and by @phildong in porting the Miniscope software to Linux.

This PR fixes the following two issues via its "Advertise only actually supported controls as supported" and "Return signed min/max values when UVC spec requires them" commits which prevented a Linux system from sending proper values through completely:

The device was advertising to support more stream controls than it actually could, resulting in the generic Linux uvcvideo driver invalidating a whole block of controls (gamma, saturation, sharpness). With this patch, only actually supported controls are included in bmControls for all USB speed settings.
The device was attempting to return unsigned integers for which the USB UVC specification requires signed integers, namely for HUE and BRIGHTNESS. Since 0xFFFF is -1 in signed-int form, the system assumes a max value of -1 and a minimum value of 0. The Linux driver - being nice to userspace applications - performs value sanitization, clamping the sent values to their min/max range before the device sees them, so any value submitted by an application will be lost (same applies to values sent from the device) I have no idea why this particular issue isn't seen on Windows, apparently that OS is very forgiving and DirectShow just lets bytes through completely unfiltered.

The other part of this PR has a more complicated origin. With the above patches, I still couldn't get the Miniscope to work properly, so I debugged the device a bit more. The Linux system was sending the right values through (I verified that by intercepting the USB "packets"), but the hardware showed no or completely random behavior. Ultimately, I discovered that the root cause for this was - surprisingly - a timing issue. Apparently Linux is a bit quicker in sending requests to the device then Windows, I measured a ~100µs delay between two USB requests (and that includes indirections like the OpenCV capture wrapper and some debug prints in the controlling Qt application). The device appears to be unable to handle that. This means that by just inserting a usleep(100) after a value was sent, you can get the Miniscope DAQ software to work on Linux (depending on how the discussion for this PR goes, I may file another PR with that change to the Qt client code). Of course, adding random delays is a recipe for disaster and pretty silly, so I set out to fixing this properly.

I originally suspected some threading issue, as there were a few unprotected datastructures in the I2C ringbuffer that could - under the right conditions - result in the buffer getting its reading index offset and subsequently only reading garbage from the buffer. For that I added explicit locking, but sadly this didn't fix the problem. So I investigated this further, and while the data was sent in the right sequence, I wasn't sure whether it would actually arrive at the board in the right order, or the Cypress APIs being able to query the data in the right order. This code was originally made for controlling device properties afterall where ordering of commands doesn't truly matter, not for random byte transfers where it actually does. So I augmented to code to be resilient against commands arriving in random order and permitted completing the current packet in random order (if you send e.g. CONTRAST multiple times, you will just update the current value, but if it was the last value missing to complete a packet, the current buffer will be sent via I2C immediately). That change improved the error condition of the non-working case (it worked occasionally (which is as good as not working at all), but unfortunately still didn't really fix it.

These changes did reveal the root cause of what is going on when Linux is sending USB commands fast: Some commands just appear to be missing and be displaced by the next event if it arrives too quickly. I had assumed that the Cypress code would do some kind of queuing of USB signals, but in fact from looking over the documentation that I have here, it never makes that claim. The CyU3PEventGet also appears to suggest that it will only retrieve the "current" event. If that is truly the case, it explains this issue perfectly, and a proper fix would be to do some queuing of USB commands or alter the byte transmission API completely. @daharoni do you have any opinions or insights on that? I'll also look deeper into the documentation of the Cypress SDK to figure out how exactly the USB endpoints for their devices function - if there truly is no queuing, then we have our bug root cause.

In any case, even though the queue refactoring doesn't actually fix the issue I had, I now had the code and I think it's worth having it.

You can find a firmware BLOB that was built from the code in this PR here: cyfx_uvc_lintest_ghpr2.zip

Thanks for considering this PR! - And sorry for its volume (it made sense to bundle these changes in one PR to me though).

Btw: There is still a theoretical bug in the ringbuffer implementation: If more than 64 elements (it's maximum element count) are written and the pending element count subsequently goes above that value, the reading index will be offset permanently and read garbage and incomplete packets once it gets to reading data again. I don't think this is a likely scenario though (possibly something to fix anyway in a later PR as a general code improvement :-) )

@daharoni Just out of interest: Would changing the hardware communication interface for the DAQ firmware (and Qt client software subsequently) still be an option for you? One could for example introduce a "ready" value that the software can read to wait until value it sent was actually recognized. That would fix the timing issue. Also, was there a reason why you want with three 16-bit UVC properties to transmit the data, instead of doing this with one 16-bit property and multiple writes? What I think would work is the following scheme:

Three UVC camera control properties used for data transmission, pData, pReady and pSubmit. pData is 16bit (the other two could be any integer)
The client writes a magic value to pSubmit triggering a reset of the current packet buffer, starting at zero. It waits for the device to ACK that change by reading pReady until it has a specific value
The client fills up the packet with writes to pData, waiting for pReady to ACK any write
The client causes the device to actually submit the data via I2C by writing another magic value to pSubmit (again, waiting for a reply)
If any wait-for-reply actions time out, another reset signal is sent and we try again

With that scheme, one could - potentially - skip sending a data length, as that would automatically be determined by how many writes to pData happen until the packet is finalized (unless below-2byte values are needed). Technically one could also fold pReady and pSubmit into one property that is readable and writable and has a defined protocol. By using something like this, we could get rid of the timing issue, and also be less dependent on the quirks of the individual camera properties. One drawback is that the API is stateful (but so is the previous one, I don't think there's a way around that) and that waiting for an ACK reply and requesting values from the device continuously will definitively slow down things (not sure by how much though).

In any case, please don't get me wrong, I am not requesting the API to be changed, this is just an idea - and there might be valid reasons why the three-values-autosubmit scheme was preferred over the datachannel-ready version. What I definitely would like to consider is having a possibility to read an ACK value back to check whether a device property was actually read, in the event that a new UVC event indeed displaces the previous one. If you don't like having any of that, I could also submit my "slow down change submission" patch to the Qt client app as a workaround for Linux users.

Hey @ximion, Thanks for the PR and all the investigating you are doing. I will try to test this out with the Window's DAQ software soon.

I think you are right on the money for pretty much everything you are saying. I initially was using 3 UVC properties because I had implemented something very similar to what you are describing (basically having some level of flow control and validation that data was received correctly). But from my experience, it looks like the handling of UVC property changes and I2C in the DAQ are by far the largest source of crashes in the Miniscope system. This seems to come from some of these being blocking operations and can cause the DAQ to miss a Line Valid or Frame Valid signal from the Miniscope. Minimizing the number of UVC property reads/writes and clumping all I2C communication to a small window inbetween frames has greatly increased the stability of the system.

That being said, I am all for a more robust implementation for all of this. Generally I do things until they are "good enough" and this slowly gets a bit better and better as we make new versions of the Miniscope platform. I also think I am totally fine with the "slow down" option. It makes sense that the FX3 Cypress Host Controller cannot keep up with fast incoming UVC commands over its endpoint (it really is just a medium speed ARM MCU at its core and is already being asked to do a lot). Also, switching to just 1 UVC property for transmitting data from software to firmware (and vise verse) is also likely fine.

As for the possible issue with the ringbuffer, a solution to this bug was also initially written into the firmware but I removed it at some point (not sure why but likely just got taken out when I was changing things around and never made it back). Theoretically, the ringbuffer should pretty much get cleared out at the end of each frame since it likely won't hold more than 1 or 2 parameter changes and has a few milliseconds between frames to send those over I2C. With imaging usually running at above 10FPS, it is unlikely a user can fill up this ringbuffer quickly enough to cause this bug. That being said, it definitely can (and should?) be fixed.

But from my experience, it looks like the handling of UVC property changes and I2C in the DAQ are by far the largest source of crashes in the Miniscope system. This seems to come from some of these being blocking operations and can cause the DAQ to miss a Line Valid or Frame Valid signal from the Miniscope. Minimizing the number of UVC property reads/writes and clumping all I2C communication to a small window inbetween frames has greatly increased the stability of the system.

Surprising, given that the UVC property changes end up on a control-type endpoint and not an interrupt-type endpoint, and they are even fetched in a separate thread... But given the very constrained CPU, maybe even the slowdown from that is enough to cause problems.

Minimizing the number of UVC property reads/writes and clumping all I2C communication to a small window inbetween frames has greatly increased the stability of the system.

I think moving the I2C communication there was really helpful - I would bet on that being the main reason why the new firmware runs so much better.

That being said, I am all for a more robust implementation for all of this. Generally I do things until they are "good enough" and this slowly gets a bit better and better as we make new versions of the Miniscope platform.

That's what open-source software is for :-)

I looked into this issue more now, and I think that the way things are now is the best we can do at the moment without a hardware redesign or some very major software changes that may not be worth the effort. The primary reason is that my ideas for communicating with the chip would require reading some property back to ensure data has really arrived. In order to do that, we need to write a request for that to the USB control endpoint though, and that would override the preexisting control-change-request that was there before, which kind of defeats the point of safely sending data in the first place. Furthermore, making these changes would require quite a bit more communication, and that, as you wrote, is quite brittle, so I wouldn't risk making the system less stable again by sending more.

The actual proper fix that should be done here is using UVC extensions - I recently came in contact with that via cameras from The Imaging Source: https://www.theimagingsource.com/documentation/tiscamera/uvc.html (they even use JSON to define their extensions, similar to what the new Miniscope software does) With that you can define a large enough integer control and send that.

Microsoft also has docs for writing extensions for their driver: https://docs.microsoft.com/de-ch/windows-hardware/drivers/stream/uvc-extension-unit-code-samples

And apparently the board also supports this: https://community.cypress.com/thread/30696

The problematic thing is that implementing UVC extensions is a lot more complicated than doing what is done now, and we would also need specific implementations for Linux and Windows (even though there's code to be reused for both OSes, that's a lot more work).

So, tl;dr I think the way of least resistance and also the best way for now is to just add that ugly delay. I've opened a new PR for that for the DAQ software: https://github.com/Aharoni-Lab/Miniscope-DAQ-QT-Software/pull/5 With that PR merged and this firmware patch applied, people can use the Miniscope on Linux (only v3 was tested by me though, but I assume v4 will also work).

For the ringbuffer thing, I don't think anybody would ever hit this when using the scope normally. The reason to fix it would be to protect people who copy the code for other projects from running into it, and for "making the code better", not really to fix an actually existing issue ;-)

Thanks for looking into this PR and for testing the change on Windows!

Hi! @daharoni Thank you for the share of software and hardware. It's very helpful. Now I have one question, can the I2C of the DAQ firmware use DMA-mode to send command or receive bytes? As you know, DMA-mode has very little affect to the thread; sending commands over I2C using non-DMA mode, always takes some time . Using DMA-mode of the I2C in the DAQ maybe is an option. Thank you again for the study of the system!

@931436690 Currently, control data is transferred in register mode, and all frame data is transferred via DMA. Changing both to DMA would definitely be an interesting experiment, but depending on how DMA is actually implemented the result may be even worse (as we may interfere with the image buffers).

At the moment, with slower emission of commands from the computer, stability is actually great for me and my data acquisition is very reliable. The only annoying this is that as soon as you close the camera and reopen it (sending all the initialization data again) it will lock up. Power cycling the board fixes this though (I haven't had time to look into this issue as well, as it's a lesser inconvenience. Maybe when this PR is merged, that issue will be next).

Hi @ximion, Thanks for the PR and all the info. I am a bit backlogged on things so haven't gotten around to testing your PR. I think I will merge it into a dev branch for now.

@931436690, thanks for the comment. To build off of what @ximion said, right now DMA isn't used for I2C communication of the DAQ but it is possibly an interesting idea to implement this. We would need to look at how many clock cycles are used in setting up the DMA transfer each time versus how many clock cycles are used in non-DMA I2C operation.

We are working on a much more flexible DAQ design currently, so we are somewhat limiting the time we spend on this DAQ. We definitely will still work on major (and some minor) issues but likely will leave things that seem "good enough" alone for the time being. That being said, we would love to incorporate contributions for others when possible.

Thanks!

Sorry for the late reply!

We are working on a much more flexible DAQ design currently, so we are somewhat limiting the time we spend on this DAQ.

Would be interesting to hear about that, so we don't dump huge amounts of time into an old design :-)

After ages, Github now finally allows changing the branch target, so that's easy to do. Would be nice to have these changes in a built release, so people don't run into hard to debug problems.

Thanks @ximion! I have been pretty swamped with work (and life in general) but the moment I have some time to test these changes out I will move them over to the master branch and create a new release! Thanks so much

Aharoni-Lab / Miniscope-DAQ-Cypress-firmware

Ringbuffer refactoring, Linux support & quirk fixes #2