cloudius-systems / osv

OSv, a new operating system for the cloud.
osv.io
Other
4.06k stars 603 forks source link

Serial port is slower than necessary #921

Open nyh opened 6 years ago

nyh commented 6 years ago

@wkozaczuk noticed that when uploading a lot tiny files to OSv, cpiod's outputting of the file names can take up a majority of the performance: He reported that by commenting out the printouts, on OSX he measured

With QEMU on average I was able to cut down the time from ~ 3 minutes to ~2 minutes and from ~10 minutes to ~ 10 SECONDS with VirtualBox when uploading 13,000 tiny files.

While we don't expect the serial port to be quick, as every character output requires an exit, perhaps can can be made not as terrible (especially on VirtualBox). I suspected that:

For every character isa_serial_console::putchar() writes to an IO register (incurring the usual hypervisor exit cost), but then, things deteriorate: Before we output the next character we need to loops and check when the hypervisor consumed the previous character. If this doesn't happen immediately and for some reason the guest and hypervisor are on the same CPU, we may need to wait until a context switch which can take ages. I don't know why VirtualBox is particularly slow in this. Maybe it answers the serial port write less immediately or maybe threads are pinned to CPUs differently.

Two things we should investigate to help this situation:

  1. Currently we assume that the UART has a buffer of only one character. I believe that modern UARTS actually have a larger transmit buffer, and I am guessing that both QEMU and VirtualBox supports those, and can reduce the number of context switches and silly busy loops. Looking in Linux's tty/serial/8250/8250_core.c suggests that 16550A (which I think VirtualBox provides?) may have a 16-byte output FIFO, and some sources I read suggest that qemu also has a 16-byte TX FIFO.

  2. Currently we busy-wait until we can output another character. It might make sense to use TX interrupts instead, so that we can immediately let the host threads run (and handle the characters in the fifo). Doing this will be harder than fix 1 above, and have more potential for negative performance impact, so we should probably try fix 1 first.

justinc1 commented 6 years ago

See also:

nyh commented 6 years ago

About the first link - funny, I didn't remember writing that three years ago :-)