Open mweal-ed opened 10 years ago
Well first off, your contributions are most welcomed! I haven't played with this in the last few months and there are a lot of issues I've left unresolved. However, you're delving into one of the most important and also tricky areas.
So I'm first presuming that the performance stats you are talking about are with debugging turned off in the config (i.e., the out-of-tree-autoconf.h
file). If not, just comment out CONFIG_MCP2210_DEBUG
and CONFIG_MCP2210_DEBUG_VERBOSE
and we're done here. Either way, I know for certain that performance is not yet optimal, although the framework is in place and being used (tuning of this would be in mcp2210-spi.c between in spi_complete_urb
where we set the variable expected_time_usec
). Also of note is the module parameter pending_bytes_wait_threshold
which appears to be insufficiently documented. When the mcp2210's buffer has at least this many bytes in it, then the driver will delay before communicating again -- the default value is 45. Also, this value is ignored unless the rather stupidly named config option CONFIG_MCP2210_DONT_SUCK_THE_CPU_DRY
is enabled. However, it turns out that disabling that parameter may also slow transfers due to taking too much time from the mcp2210 its self.
But moving on to the chip, it turns out that the mcp2210 is just a pre-programmed PIC of some sort (somebody figured out exactly which one, I forget) in a different package. It has an internal 64 byte buffer, but can only rx/tx 60 bytes of data at a time. I forget off-hand if this is a limitation of HID interrupt URBs or just the interface (I'm thinking that it's because it's designed to be HID compatible). So at the USB level, each message is 64 bytes, but that includes a 4 byte header.
The tricky part is figuring out the optimal delay between sending it more data, because each time we communicate with the mcp2210 via USB, it cannot be talking to the SPI device. So what you're seeing is that we send 60 bytes (42 3c 00 00), the maximum that we can and then it tells us that the transfer was started (42 00 00 20) and that no data has been transmitted yet (of course).
So timing is pretty skewed here because outputting debug info greatly impacts throughput. However, at 11265.704937 you see "usb 2-1.3: timer_callback:" -- if this weren't being debugged, there would have been an appreciable delay between this line and the one right before it. Instead, it pretty much fired as soon as we re-enabled interrupts. So for struct mcp2210_cmd_type_spi
, pending_unacked
is the number of bytes we have sent to the chip and not received an ACK (the 42 xx xx xx that doesn't indicate an error) and pending_bytes
is the number of bytes that we have sent and had ACKed, but that the chip has not yet told us it sent and given us the peripheral's response. Why exactly we only send 4 bytes on the next communication I can't precisely recall, either the chip will NAK them (due to insufficient buffer space) or I have made an error. :) So maybe that is one that you can experiment with. In fact, I'm starting to think that it's the former, because I believe that it does not have a separate read & write buffer, so if we sent 60 bytes, when we already knew that it has 60 bytes to process, then we know that it can only take 4 bytes -- definitely something to confirm and document (probably in the README.md).
Let me know what you find out please!
Thanks for the feedback. I'm off in the right direction now. One thing I have noticed is the performance is minimally impacted by the debugging.
I will let you know how things go.
jiffies = 4ms.....argh
The time value of jiffies depends upon CONFIG_HZ (Processor type and features -> Timer frequency) and you probably have yours set to 250 HZ. The use of jiffies is far more efficient (in clock cycles) than other mechanisms, although you get a lower resolution.
I was at first surprised that you saw little difference in performance without debugging, but then I remembered that I did most of my development & testing on a Raspberry Pi, which is quite a bit slower than the typical x86_64 machine!
Along the lines of clock cycles consumed by the driver is the use of packed structs -- this actually bloats the code quite a bit and is on my todo list, but I doubt it's an issue here. My worry is that you may not be able to achieve any descent throughput with the driver in its current form. This is because we set timers and ask to be awaken at a specified time and the contract here is that the timer will trigger no sooner than the specified time, but may occur later. Also, you should verify that the time that we expect to wake up is sane and not screwed up. Unfortunately, I've never done high speed tests on this because my application only needs small data transfers.
How is it coming along? I'm thinking that we need some profiling debug messages, so we can turn off verbose messages, but emit a few messages that just show wake up, take action and go back to sleep stuff to make sure that the actual work the driver is doing isn't taking too long for some reason (which I suspect isn't the case).
Of course, only having a 64 byte buffer makes it hard to get anything close to full speed out of it.
EDIT: For clarification, we don't actually ever explicitly "sleep", we just set either a timer (for something that can be done in atomic mode) or a delayed_work (for something that may need to sleep).
AHH!! It looks like we shouldn't be worried about the pending_bytes
at all! So basically, it has a two separate 64 byte buffers, one for rx and tx. I just ran a test after making this change:
diff --git a/mcp2210-spi.c b/mcp2210-spi.c
index 1acacbf..47deb57 100644
--- a/mcp2210-spi.c
+++ b/mcp2210-spi.c
@@ -611,12 +611,14 @@ static int spi_submit_prepare(struct mcp2210_cmd *cmd_head)
if (cmd->pos + cmd->pending_bytes < cmd->xfer->len) {
len = cmd->xfer->len - cmd->pos - cmd->pending_bytes;
+#if 0
/* don't try to send more than the buffer will hold */
if (len > MCP2210_BUFFER_SIZE - cmd->pending_bytes) {
len = MCP2210_BUFFER_SIZE - cmd->pending_bytes;
if (!len)
goto buffer_full;
}
+#endif
if (len > MCP2210_BUFFER_SIZE - 4)
len = MCP2210_BUFFER_SIZE - 4;
@@ -631,7 +633,7 @@ static int spi_submit_prepare(struct mcp2210_cmd *cmd_head)
cmd->pending_unacked = len;
cmd->pending_bytes += len;
} else {
-buffer_full:
+//buffer_full:
len = 0;
start = NULL;
}
And I did not get a NAK (usually status 0xf8) from the chip, so we're doing it wrong. :( Of course, that alone can only give an increase of roughly 87% and we need more than that. So we can basically send 60 bytes each time we talk to it (I suppose unless we talk to it too soon).
Sorry for the late reply, I've been away for a few days.
1) I came to the same conclusion/changes about pending_bytes
, but it only gave be about a 50% improvement in performance instead of 87%... not sure why.. could be the resolution of the performance data (from 4 to 6 kB/s). Not really important at this point.
2) Connected a Beagle USB bus analyzer. Could see I was getting a NAK (No interrupt pending) on the PCI bus for each In message and an associated elay until it was tried again. Thought this might be due to the EP_IN be submited before the EP_OUT, so tried switching the order. No improvement.
3) I was finding various delays due delayed _processing. Disabled CONFIG_MCP2210_DONT_SUCK_THE_CPU_DRY
. Did not help the performance.
4) All rescheduling is using jiffies. xxx_to_jiffes all round up, so we always had 4 ms delay when we rescheduled. This is a long time when for these kind of transfers. Forced all delays < 1 second to 0. Still no improvement in performance.
5) With all these changes, looked at the bus analyzer again. We are getting 3 responses with a 0xF8 – SPI Data Not Accepted
status taking about 7-8ms.
6) Check with scope. Was getting 100us between byte transfers. Realized I had my_power_up_spi_settings.delay_between_bytes to 0, but not my_board_config.spi.delay_between_bytes. Fixed and re-tested. Performance is now 13kB/s. still seeing 40-50us between byte transfers. With a data transfer of of 50us per byte we could achieve a maximum of 20 kB/s or 160 kb/s. This is still a lot less than I need.
In short, I think although we could make some improvements to the driver performance, but the overall performance is limited by the mcp2210.
Sorry for my late response as well!
Thanks for your help on this. I'm going to update the documentation to reflect these hardware limitations (I suspected that we wouldn't get anything close to their "advertised" data rates). Also, will be tweaking out this driver too.
Hi,
Did you manage to measure the maximum SPI output speed using your driver ? At the moment I'm using emulated SPI over GPIO on a openwrt router and I can't get above 1 MHz. Can mcp2210 chip and your drive provide better speed ?
Thank you
Been a few months since I looked at this, but I believe I figured the maximum possible speed to be 160KHz and that would need a lot of optimization. The 160KHz is a limitation of the hardware.
Do you know if there is a way to increase the speed of a emulated SPI over gpio ?
It might be possible, but it would mean writing your own kernel driver dedicated to your hardware and environment and heavily optimized. This would also bog down linux. You could look at some of the other usb-spi bridges. I know the of the SILABS CP2130 which can do to the higher data rates, but I could never get it to work right in my application. I believe FTDI offers one, but I do not know what data rates it could handle. You could also program your own USB to SPI on a dedicated mcu. I believe you would have to write your own kernel driver for any of these.
Our device Nusbio (http://www.nusbio.net) can do between 10 and 20 Kb/s in SPI using any .NET languages. And The FTDI FT232H can do from 1 to 3 M byte/S. it is an USB 2.0 Highspeed.
See my blog post http://madeintheusb.blogspot.com/2016/02/usb-to-spi-for-net.html
Now that I've finished the base features that I originally intended for this driver (I finally got interrupts working), I'm considering abstracting it into an mcp2210-specific portion and a generic USB-to-x library and implementing the same functionality for the FT2xxx series via MPSSE. So I've started the process of trying to figure this out -- plus we could get all of the other protocols as well.
If any of you guys are aware of any other work on this I would appreciate a link. :)
EDIT: I would actually be interested in hardware suggestions. Among the qualities I'm looking for are:
I am trying to speed up the transfer speeds as they seem to be at lot slower than I expected (4096 bytes in .77 seconds or 43Kbps vs 4-6Mbps) I would expect it to be a couple of orders faster than this. I know some of the problems is due to the overhead on the USB bus. I was wondering if you could point me in the in the right direction to fix this.
The first issue I have found is drivers looks like it is breaking up the transfer rather oddly. It is breaking up large transfers into 60 byte and 4 byte transfers alternately rather than all 60 byte transfers. I can see where spi_submit_prepare restricts it to 60 bytes transfers but have not found where the messages are restricted to 4 byte transfers.
I am developing on ubuntu 14.04 x86_64 the following is a snippet from the syslog.