Open pallix opened 7 years ago
The read and write paths in the C code are coupled, so even though you have separate processes in Elixir a big write call will delay reading bytes from the OS's internal buffers. Based on your experiment, I would assume that the OS's internal buffer is 4096 bytes and when you write more than that, the OS drops the additional bytes. If the C code were actively removing bytes from the serial port while the big write was happening, this wouldn't happen.
Interestingly enough, I had a note about this coupling in the C implementation, but I had thought that it only would affect performance and since serial ports generally only operate at very slow speeds, I didn't worry about it. This is an interesting consequence of running two UARTs and looping them back on each other that I hadn't considered. I hadn't run into this use case in my own work.
This issue could certainly be fixed. It's a little tricky, though, and I don't have time at the moment to do it. I'm really glad that you pointed this out, since I bet others may run into it and it feels more legit now to spend time decoupling the read and write paths in the C code.
Thank you very much for your detail answer.
Is there one buffer per physical device? or one per device in /dev
? I am writing on one device in /dev and reading on another. The physical is binding two ports of the machine.
Does that mean that if I run the process on two machines I will not have the problem?
Hmm. Now I'm less sure. I was thinking that you were reading and writing to one device and the receive wire was connected to the transmit wire. If you have two nerves_uart
GenServers running for two different devices then that messes up my theory that the problem was in the nerves_uart
C port implementation.
Are you running on Linux?
Also, have you tried removing the call to UART.drain
and just letting UART.write
block when it has to? I have a vague recollection of a serial driver where that call had a side effect of coupling the rx and tx paths, but that was a long time ago and certainly serial device-specific.
As for two machines, I would absolutely hope that you wouldn't see this problem on two machines. If you did, then that pretty squarely points to the transmit side having a limit of sending 4K at a time. I don't see how that could be nerves_uart
, but I guess that it would be something to investigate.
I am running on Linux. Removing the call to drain
causes the data to be corrupted.
I am transmitting a file over a serial line. I read the file like this with an Elixir task:
and read the file with another task:
I have created the file with
dd
:dd if=/dev/urandom of=./input.bin bs=1024 count=683
. When settingbuff_size
to 4098 there are errors when transmitting the file:whereas a value of 4000 or 2048, or 1000 works. The same byte is always the incorrectly transmitted.
Is there a bug somewhere in the library or am I doing something wrong?