Closed jcw closed 6 years ago
Are you just polling in the main loop? If you poll from the USB interrupt handler you should see reduced CPU usage.
@kisielk - The load is on the host side. Yes, I'm polling on the device side. Why/how are those related?
Ok, I've just tried the interrupt demo build - same effect, Mac's "kernel task" generates high CPU load.
PS. The high load is only present when a serial connection is actually open.
I noticed a high load on the MCU. Because the host libusb_stm32 polls the device continuously, and according to the standard it should be 1000 times per second. Maybe the MCU does not report the end of the transfer?
Aha, that could explain it. Maybe libusb_stm32 needs to end with a 0-byte transfer back to the host? Just guessing - I'm not sufficiently familiar with the USB wire protocol to really give an answer. But yes, there will be a poll from the host every 1 ms. That in itself should be a trivial load, on both sides.
Another observation: the load on the host goes up as soon as the serial connection to the Blue Pill has been established, no serial data exchanged yet.
I don't understand how this code works - it always checks against CDC_EP0_SIZE
(i.e. 8), whereas data packets are indeed properly set to 64 bytes (my mistake, I had overlooked CDC_DATA_SZ
).
AFAIK, you can send back any number of bytes in 64-byte chunks, but if the total size is not a multiple of 64, then you must send a final empty packet, else the host will endlessly keep on polling for more.
Isn't the above code in conflict with that rule?
According to USB specification V2.0 chapter 5.5.2 "The Data stage of a control transfer from an endpoint to the host is complete when the endpoint does one of the following: • Has transferred exactly the amount of data specified during the Setup stage • Transfers a packet with a payload size less than wMaxPacketSize or transfers a zero-length packet" About the code. It must be contain an artifacts from older state machine. I'll check it ASAP.
Looks it should work as expected
After request processing the internal state machine sets one of the following states usbd_ctl_txdata
in case of data size is equal to req->wLength (if data size is bigger it will be truncated) and usbd_ctl_ztxdata
otherwise.
The code to which you pointed checks the state after passing all data chunks to TX buffer. It sets the next state to usbd_ctl_lastdata
if no ZLP required or leave it unchanged otherwise to issue ZLP after next "TX completed" event (dev->status.data_count = 0)
About high cpu load. I'll check it on MAC as soon as i found a MAC. Is there any USB sniffers for MAC like wireshark or usblyzer ?
You can use Wireshark on macOS: https://aud-ios.com/2017/10/22/usb-monitoring-with-wireshark/
Thanks for checking so far, Dmitry. To be clear: the serial USB functions as expected, it's just the high CPU load that worries me. My laptop fans always start spinning after a while, which is how I noticed it. I have not tried other OS's to check whether this is MacOS-specific.
i don't tested cdc-loop in idle. Usually i do a two-terminal test, like do copy some video or big archive to usb TTY by dd and do capture the output stream by mpv or copy it to file then verify
In cdc_loop.c
the cdc_loopback
function always send data to the endpoint, which causes a loop. If nothing is in the fifo a zero length packet is sent, end of transmission causes an interrupt, and the loop goes round.
case usbd_evt_eptx:
_t = usbd_ep_write(dev, CDC_TXD_EP, &fifo[0], (fpos < CDC_DATA_SZ) ? fpos : CDC_DATA_SZ);
This line always sends something, whether it is the data received or a zero length packet.
Could it be this that is causing the high load on the host? I wouldn't have thought it could cause 100%, but it does explain why the host is seeing traffic, but the serial terminal is not seeing extra data.
Yes, The problem may be there. Also the very small bInterval
value in data endpoints desctiptors may have this effect.
I've tested cdc-loop on in idle and terminal mode under Linux using screen /dev/ttyACM0
and found no extra CPU load.
Could well be. If there's an easy way for me to test a modification, please let me know.
Try commenting out
_t = usbd_ep_write(dev, CDC_TXD_EP, &fifo[0], (fpos < CDC_DATA_SZ) ? fpos : CDC_DATA_SZ);
and see if the load drops. It will break the loopback, but it will tell you if we are on the right track.
So what you're saying is that it's a side-effect of the loop demo, and won't be an issue in real-world use of this driver, right?
Ok will do!
Bingo - high load gone!
I also wonder if there should be a break;
before this line ...
The problem is you have to send a zero length packet if the previous packet is the same length as the packet size of the endpoint. Otherwise the host won't process it until you send the next packet.
To fix this would require a few changes, but as it's just a demo I'm not sure it's worth doing.
Confirmed, if I change it to not send any zero-length at all, this will fail.
As loopback, it's not really important, I agree (it should be noted in the comments at least, IMO).
But I'd like to use this driver for a normal serial console, and having an example of how to do this properly would be most helpful.
Ok, I think I can work it out - we need to keep sending writes as long as the last send is CDC_DATA_SZ. Feel free to close this. It's a problem of the loopback demo, and apparently it's also MacOS-specific.
Hm, I'm still missing something. This isn't enough to get the loopback going:
static void cdc_loopback(usbd_device *dev, uint8_t event, uint8_t ep) {
static int lastTxSize = CDC_DATA_SZ;
int _t;
switch (event) {
case usbd_evt_eptx:
if (fpos == 0 && lastTxSize < CDC_DATA_SZ)
break;
lastTxSize = (fpos < CDC_DATA_SZ) ? fpos : CDC_DATA_SZ;
_t = usbd_ep_write(dev, CDC_TXD_EP, &fifo[0], lastTxSize);
if (_t > 0) {
memmove(&fifo[0], &fifo[_t], fpos - _t);
fpos -= _t;
}
break;
case usbd_evt_eprx:
....
}
}
Try this, tested on bluepill F103
static void cdc_loopback(usbd_device *dev, uint8_t event, uint8_t ep) {
int _t;
while (1) {
if (fpos < (sizeof(fifo) - CDC_DATA_SZ)) {
_t = usbd_ep_read(dev, CDC_RXD_EP, &fifo[fpos], CDC_DATA_SZ);
if (_t > 0) fpos += _t;
}
if (fpos == 0) break;
_t = usbd_ep_write(dev, CDC_TXD_EP, &fifo[0], (fpos < CDC_DATA_SZ) ? fpos : CDC_DATA_SZ);
if (_t > 0) {
memmove(&fifo[0], &fifo[_t], fpos - _t);
fpos -= _t;
} else break;
}
}
Please note that RX event can be issued only onse per transation. If you miss it for any reason data will be stuck in RX buffer until you read it and endpoint will be in NAK state.
Yes! I can confirm that loopback works with your code and that the kernel load is gone.
This is a great starting point. I'll need to extend it to use states, because I also want the code to work in interrupt mode, but it's an excellent help to see what's needed to get this going. Thanks very much.
Here's another solution - never send full packets:
static void cdc_rxonly(usbd_device *dev, uint8_t event, uint8_t ep) {
if (fpos <= sizeof fifo - CDC_DATA_SZ)
fpos += usbd_ep_read(dev, CDC_RXD_EP, fifo + fpos, CDC_DATA_SZ);
}
static void cdc_txonly(usbd_device *dev, uint8_t event, uint8_t ep) {
int n = fpos < CDC_DATA_SZ ? fpos : CDC_DATA_SZ - 1;
n = usbd_ep_write(dev, CDC_TXD_EP, fifo, n);
if (n > 0) {
fpos -= n;
memmove(fifo, fifo + n, fpos);
}
}
(there are some other code changes in there, notably <=
iso <
in crc_rxonly
)
never send full packets
@jcw And wait a millisecond for next request?
Yes, that's 630 kBaud throughput. It'd be nice to have a better solution, but this is acceptable for me.
While trying out this driver on a Blue Pill plugged into MacOS, I'm seeing a very large CPU load in the kernel (eventually, one core gets pegged at 100%). I'm trying out the demo in loopback mode, polled.
While I understand that the BP will be running flat out, polling the USB hardware, I don't understand why and how this would affect the overall load on the USB bus or host CPU. Is this a (known) bug?
My other question may be related: why is the packet size for data transfers limited to 8 bytes? It is set with a
#define CDC_EP0_SIZE 0x08
in the demo, could that be raised to 64 bytes? Unfortunately, it also appears to be hardcoded in other places.FWIW, my point of reference is another USB implementation - it's in Forth, but also for the same F103 µC of the Blue Pill. There, I don't see this load on the CPU, and 64-byte packets appear to work fine.