Closed PaulStoffregen closed 3 years ago
Teensy already does something like Pluggable Serial Monitor with IDE 1.8.5 to 1.8.15 using a large patch the Java code. A very simple scheme using stdin & stdout for the bidirectional data flow and stderr for status update is used.
One unexpected (at least to me) issue that came up with Teensy's implementation is the poor handling of buffering and flow control on Windows. While stdin / stdout offers a very simple interface, if we're already seeing problems with Teensy 4.0 able to use about 50% of the USB 2.0 bandwidth (480 Mbit/sec), in the not too distant future of microcontrollers with USB 3.0 bandwidth (5 Gbit/sec) we will need an interprocess communication method which gives reliable flow control.
I've made some benchmarks: https://github.com/cmaglie/data-transfers-benchmarks#readme to compare the "pure" performance of stdio
vs TCP/IP, and it seems that stdio
is not so bad after all (especially on windows it looks like it will outperform TCP/IP! the opposite happens on Linux, did not test on mac).
BTW is this huge data transfer from the monitor to the IDE is really needed? Can't we throttle (drop) data directly from the monitor
tool if the IDE cannot keep up?
My rationale is:
port
-> monitor
tool -> IDE
monitor
tool proxies data from port
to the IDE
as fast as it canSo, since the IDE will not be able to display it anyway, isn't better to throttle at the monitor
level?
On the other hand I realize that implementing a flow control system is tricky to say the least...
A very simple scheme using stdin & stdout for the bidirectional data flow and stderr for status update is used.
Another question that comes to my mind is: how do you change port settings? AFAIU stderr is monodirectional from the "monitor tool" to the IDE so there no way on the IDE to communicate changes on the monitor.
In my mind I'd like to use TCP/IP to stream data in and out from the port and leave stdin/stdout only for signaling and control.
TCP/IP for bidirectional data and stdin/stdout for status sounds good to me.
Flow control is essential. When flow control works, the speed automatically throttles as needed by the GUI. I'm optimistic TCP/IP will give flow control on all platforms. According to official documentation, stdin/stdout should too, but my experience shows Windows doesn't.
Some protocols lack flow control. Usually those are fairly slow, but for those specific protocols data does indeed need to be discarded when the incoming buffer is full. For Pluggable Serial Monitor, my hope is the IDE can use TCP/IP and the pluggable utility program is responsible for handling the scenario where the TCP socket can't accept more data without blocking.
Flow control alone doesn't solve all IDE issues. The IDE must constrain its memory allocation. But that's a design issue entirely within the IDE, separate from Pluggable Serial Monitor or which protocol is used. I have a huge patch which solves this in the Java IDE, but it's far from simple. If you ever want to put that work into merging it, I'm happy to help (and of course the code is open sources & always available on my public github account). But if the new IDE is your main focus, I completely understand.
I'd like to move forward with a TCP/IP test program. Can you give me your thoughts on the stdout / stdin status messages you want to see? JSON both directions, or something simpler? Messages for the IP address and port numbers to use? Does the IDE listen for an incoming TCP connect from the pluggable utility, or the other way around?
@PaulStoffregen, sorry it took a bit of time to make up my mind...
I'm writing down the RFC but the general idea is:
stdio
we will talk a protocol very similar to the Pluggable Discovery, so:
CONNECT <address>
. It will connect with the underlying port and start to listen to a localhost TCP port. The listening port it's sent back in the json response and the IDE should connect to that TCP port to start receiving data back and forth.Hope this is enough for you to start experimenting.
I'm a little confused by 2 uses of the word "it", and also the word "connect".
It will connect with the underlying port and start to listen to a localhost TCP port. The listening port it's sent back in the json response and the IDE should connect to that TCP port to start receiving data back and forth.
If we use the BSD sockets API terminology (socket, listen, bind, connect) and avoid pronouns, I believe this might mean:
The pluggable serial utility will bind() to a localhost TCP port and listen() for incoming connections from the IDE. The IDE transmits CONNECT <address>
to the pluggable serial utility's stdin stream, which responds on stdout with a JSON message to inform the IDE of the port number where it is listening for incoming connections. Then the IDE will connect(). The two start can communicate data back and forth as long as the connection is active.
But I don't understand why the IDE would transmit <address>
during this dialogue?
The pluggable serial utility will bind() to a localhost TCP port and listen() for incoming connections from the IDE. The IDE transmits CONNECT
to the pluggable serial utility's stdin stream, which responds on stdout with a JSON message to inform the IDE of the port number where it is listening for incoming connections. Then the IDE will connect(). The two start can communicate data back and forth as long as the connection is active.
Correct
But I don't understand why the IDE would transmit
during this dialogue?
I mean the port address, like CONNECT /dev/ttyACM0
, otherwise, the pluggable serial tool does not know which port to open.
A complete example may be:
CONNECT /dev/ttyACM0
via stdout
ttyACM0
and start to listen to a random TCP port (let suppose it listens to 127.0.0.1:40123
)stdin
: { "event":"connected", "data_stream_at":"127.0.0.1:40123" }
127.0.0.1:40123
to actually get the stream data from the serial portNow that I'm writing it I don't know if the pluggable serial tool should open ttyACM0
before TCP-listening. Maybe a better strategy would be to first establish the TCP connection and open ttyACM0
just after, like:
CONNECT /dev/ttyACM0
via stdout
127.0.0.1:40123
)stdin
: { "event":"connected", "data_stream_at":"127.0.0.1:40123", "port_address": "/dev/ttyACM0" }
127.0.0.1:40123
to actually get the stream data from the serial portttyACM0
and starts streaming the serial data.(the JSON responses are just examples, they will be surely some small changes in the final specification)
Ah yes, that's much clearer.
Before I dive into this, I'd like to ask if you're sure you want the pluggable utility to act as a server (calls listen) and the IDE acting as a client (calls connect)?
If the roles were reversed, we could avoid the scenario where a TCP connection is established but no hardware access is working, because the pluggable utility would be responsible for starting the connection only when the hardware access is confirmed working. And in the less likely scenario where hardware works but the TCP connection isn't possible, the hardware could be promptly closed before any data is read from it.
so what you propose is:
127.0.0.1:40123
)CONNECT 127.0.0.1:40123 /dev/ttyACM0
via stdout
(that means "please open ttyACM0 and send me data over the TCP port 40123")ttyACM0
127.0.0.1:40123
and start streaming serial data{ "event":"connected", "message":"OK" }
if the serial tool could not open the port:
stdin
: { "event":"connected", "error":true, "message":"port not found" }
Sounds good to me, let's do it this way.
Here the first draft: https://github.com/arduino/tooling-rfcs/pull/6
@cmaglie - Any chance this was meant to be "eventType" rather than "event"? Also, any idea what I should send if my monitor tool has no configurable settings?
Well, I'm stuck. Pretty sure it's supposed to be "eventType" looking at arduino/monitor/monitor.go, and because I get an out of sync error if I use event rather than eventType.
But I must be missing something else. I'm sending this in response to the DESCRIBE command.
{
"eventType": "describe",
"message": "ok",
"port_description": {
"protocol": "Teensy",
"configuration_parameters": {
"baudrate": {
"label": "Baudrate",
"type": "enum",
"value": [
"1200", "2400", "4800", "9600",
"19200", "38400", "57600", "115200"
],
"selected": "9600"
}
}
}
}
But I always seem to get "Error getting port settings details".
INFO[0000] Starting monitor process monitor=teensy-monitor
INFO[0000] Monitor process started successfully! monitor=teensy-monitor
INFO[0000] sending command command="HELLO 1 \"arduino-cli nightly-20220120\"" monitor=teensy-monitor
INFO[0000] received message error=false event_type=hello message=OK monitor=teensy-monitor
INFO[0000] sending command command=DESCRIBE monitor=teensy-monitor
INFO[0000] received message error=false event_type=describe message=ok monitor=teensy-monitor
INFO[0000] sending command command=QUIT monitor=teensy-monitor
ERRO[0000] stopped decode loop monitor=teensy-monitor
Error getting port settings details: Port monitor error: command 'describe' failed: ok
ERRO[0000] Error getting port settings details: Port monitor error: command 'describe' failed: ok
Tried reading the Go code. I see a called to EnumerateMonitorPortSettings() in cli/monitor/monitor.go at line 84 seems to be failing. Go is an unfamiliar language, but my best guess is the call to m.Describe in commands/monitor/settings.go at line 44 might be failing? Maybe? Perhaps this is calling Describe in arduino/monitor/monitor.go at line 313? But it seems like the only 2 errors are not being able to send or not getting a reply.
I see a bunch of json fields in data structures near the top of arduino/monitor/monitor.go. Most of it makes some sense, even though I don't know Go language, but I don't quite understand this line.
ConfigurationParameters map[string]*PortParameterDescriptor `json:"configuration_parameters,omitempty"`
Sadly, with so little info in the verbose messages about why my JSON reply wasn't accepted, and my lack of familiarity with Go and trying to unravel how this code works, I'm afraid I'm completely stuck. Something is wrong with my reply to DESCRIBE, causing arduino-cli to immediately send QUIT and print that error, but I just can't figure out what it's really expecting.
@cmaglie - Any chance this was meant to be "eventType" rather than "event"?
yes it must be eventType
, I'm about to push this fix to the docs (together with some other that you reported)
Also, any idea what I should send if my monitor tool has no configurable settings?
This should work:
{
"eventType": "describe",
"message": "OK",
"port_description": {
"protocol": "serial",
"configuration_parameters": {
}
}
}
But I must be missing something else. I'm sending this in response to the DESCRIBE command.
well, also the response that you posted should work, let me try it to see if I can reproduce it.
I think the problem lies here https://github.com/arduino/arduino-cli/blob/master/arduino/monitor/monitor.go#L148-L150
if msg.Message != "OK" || msg.Error {
return msg, fmt.Errorf(tr("command '%[1]s' failed: %[2]s"), expectedEvt, msg.Message)
}
the monitor checks that the message
field is OK
case sensitive. I think this check is too strict and must tolerate also ok
, but at least we now know where the problem is.
Also, you're right, the error message is very cryptic and not useful at all, this will be fixed soon... sorry for the hassles, this code is very new, we still need to smooth out some edges.
Yes, that's it. Just changed to OK capitalized and suddenly arduino-cli is sends OPEN instead of QUIT.
But something strange is still happening. Looks like arduino-cli runs my monitor tool twice. The first time it gives up and immediately sends QUIT. Then it runs the same monitor tool again and after getting the same response to DESCRIBE on the 2nd run, it sends OPEN. Is this expected behavior?
INFO[0000] Starting monitor process monitor=teensy-monitor
INFO[0000] Monitor process started successfully! monitor=teensy-monitor
INFO[0000] sending command command="HELLO 1 \"arduino-cli nightly-20220120\"" monitor=teensy-monitor
INFO[0000] received message error=false event_type=hello message=OK monitor=teensy-monitor
INFO[0000] sending command command=DESCRIBE monitor=teensy-monitor
INFO[0000] received message error=false event_type=describe message=OK monitor=teensy-monitor
INFO[0000] sending command command=QUIT monitor=teensy-monitor
ERRO[0000] stopped decode loop monitor=teensy-monitor
INFO[0000] Starting monitor process monitor=teensy-monitor
INFO[0000] Monitor process started successfully! monitor=teensy-monitor
INFO[0000] sending command command="HELLO 1 \"arduino-cli nightly-20220120\"" monitor=teensy-monitor
INFO[0000] received message error=false event_type=hello message=OK monitor=teensy-monitor
INFO[0000] sending command command=DESCRIBE monitor=teensy-monitor
INFO[0000] received message error=false event_type=describe message=OK monitor=teensy-monitor
INFO[0000] sending command command="OPEN 127.0.0.1:40125 usb16/16-1/16-1.2" monitor=teensy-monitor
This is the JSON reply I'm sending to DESCRIBE.
{
"eventType": "describe",
"message": "OK",
"port_description": {
"protocol": "Teensy",
"configuration_parameters": {
}
}
}
Yes, this one is expected, it's a missed optimization opportunity in the arduino-cli monitor
command implementation (as it is now it first asks for the monitor capabilities and after that it will open the monitor: https://github.com/arduino/arduino-cli/blob/master/cli/monitor/monitor.go#L84).
Anyway, this is normal, we will fix this eventually, it's not the fault of your monitor.
Yeah, in terms of optimization, less than ideal. My monitor tool has to complete an enumeration of devices before it can respond to OPEN. I'm starting that work in a thread as soon as the program begins running. Would be much faster response if arduino-cli could launch the process as early as possible and just leave it running, even without sending HELLO.
Would be much faster response if arduino-cli could launch the process as early as possible and just leave it running, even without sending HELLO.
There is a trade-off here, running all the monitors at CLI startup adds up as the number of monitors available increases, that's why we choose to run all the tools lazily when the CLI is not running as a daemon.
The information you get from the enumeration, maybe could be encoded in some way in the "address"? the idea is that the monitor may get it from the port address instead of re-enumerating.
I'll try to use that approach. But one issue is the "address" field can be extremely long for Linux sysfs path and Windows device instances. In the Java IDE, this is shown in the space limited status bar on the bottom of every editor window. With arduino-cli, it's show with commands users are likely to use, such as "arduino-cli board list", which also are designed with formatting meant to show only very short address names.
More progress here, and another strange problem. Normally I would do more to investigate before writing, but since this is so new, maybe hearing about the challenges I'm finding along the way will help for improving the documentation?
When I reply to the open command and reply with this:
{
"eventType": "open",
"message": "OK"
}
arduino-cli does start waiting for a connection. If I do nothing else, it seems to be willing to wait forever (or at least my attention span).
But when I connect, arduino-cli seems to be very unhappy. It quickly terminates my monitor tool's process. The verbose output says it sent a CLOSE and QUIT, but those never arrive.
INFO[0000] Starting monitor process monitor=teensy-monitor
INFO[0000] Monitor process started successfully! monitor=teensy-monitor
INFO[0000] sending command command="HELLO 1 \"arduino-cli nightly-20220120\"" monitor=teensy-monitor
INFO[0000] received message error=false event_type=hello message=OK monitor=teensy-monitor
INFO[0000] sending command command=DESCRIBE monitor=teensy-monitor
INFO[0000] received message error=false event_type=describe message=OK monitor=teensy-monitor
INFO[0000] sending command command=QUIT monitor=teensy-monitor
ERRO[0000] stopped decode loop monitor=teensy-monitor
INFO[0000] Starting monitor process monitor=teensy-monitor
INFO[0000] Monitor process started successfully! monitor=teensy-monitor
INFO[0000] sending command command="HELLO 1 \"arduino-cli nightly-20220120\"" monitor=teensy-monitor
INFO[0000] received message error=false event_type=hello message=OK monitor=teensy-monitor
INFO[0000] sending command command=DESCRIBE monitor=teensy-monitor
INFO[0000] received message error=false event_type=describe message=OK monitor=teensy-monitor
INFO[0000] sending command command="OPEN 127.0.0.1:41659 usb16/16-1/16-1.2" monitor=teensy-monitor
INFO[0000] received message error=false event_type=open message=OK monitor=teensy-monitor
Connected to usb16/16-1/16-1.2! Press CTRL-C to exit.
ERRO[0000] stopped decode loop monitor=teensy-monitor
INFO[0000] sending command command=CLOSE monitor=teensy-monitor
INFO[0000] sending command command=QUIT monitor=teensy-monitor
However, if I don't call connect(), while arduino-cli is sitting and waiting patiently for the incoming connection, I can open a terminal window and run "telnet 127.0.0.1 35087" (or whatever port number arduino-cli chose). It does seem quite happy hearing a connection from telnet.
INFO[0000] Starting monitor process monitor=teensy-monitor
INFO[0000] Monitor process started successfully! monitor=teensy-monitor
INFO[0000] sending command command="HELLO 1 \"arduino-cli nightly-20220120\"" monitor=teensy-monitor
INFO[0000] received message error=false event_type=hello message=OK monitor=teensy-monitor
INFO[0000] sending command command=DESCRIBE monitor=teensy-monitor
INFO[0000] received message error=false event_type=describe message=OK monitor=teensy-monitor
INFO[0000] sending command command=QUIT monitor=teensy-monitor
ERRO[0000] stopped decode loop monitor=teensy-monitor
INFO[0000] Starting monitor process monitor=teensy-monitor
INFO[0000] Monitor process started successfully! monitor=teensy-monitor
INFO[0000] sending command command="HELLO 1 \"arduino-cli nightly-20220120\"" monitor=teensy-monitor
INFO[0000] received message error=false event_type=hello message=OK monitor=teensy-monitor
INFO[0000] sending command command=DESCRIBE monitor=teensy-monitor
INFO[0000] received message error=false event_type=describe message=OK monitor=teensy-monitor
INFO[0000] sending command command="OPEN 127.0.0.1:39251 usb16/16-1/16-1.2" monitor=teensy-monitor
INFO[0000] received message error=false event_type=open message=OK monitor=teensy-monitor
ERRO[0000] stopped decode loop monitor=teensy-monitor
Connected to usb16/16-1/16-1.2! Press CTRL-C to exit.
hello world from typing in telnet
This is the C code I'm using to connect.
int socket_fd, localhost_fd;
struct sockaddr_in addr;
socket_fd = socket(AF_INET, SOCK_STREAM, 0);
memset(&addr, 0, sizeof(addr));
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
addr.sin_port = htons(portnum);
localhost_fd = connect(socket_fd, (struct sockaddr *)&addr, sizeof(addr));
Still trying to figure out what telnet is doing differently that makes arduino-cli happy.
@cmaglie - Looks like arduino-cli might have a race condition with the OPEN command.
After trying so many variations and even capturing IP packets with tcpdump, it seems the difference between my program and running telnet is a matter of timing. My program calls connect() immediately after sending the JSON reply. But with the telnet program, there is a long lag while I casually move my mouse over to a terminal window, press up arrow to repeat, and type the new port number.
If I add a delay in my program, it works!
printf("{\n\"eventType\": \"open\",\n\"message\": \"OK\"\n}\n");
memset(&addr, 0, sizeof(addr));
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
addr.sin_port = htons(portnum);
usleep(15000);
localhost_fd = connect(socket_fd, (struct sockaddr *)&addr, sizeof(addr));
At least on my desktop Linux machine, delay less than 15 ms causes arduino-cli to immediately terminate my discovery tool's process.
My best guess is perhaps it may be spending about 15 ms to receive and process the JSON reply, and if it hears a TCP connection arrive before it's processed the JSON, arduino-cli gets upset?
But the protocol doesn't seem to give me any way to know when arduino-cli is really ready to receive a connection. I found 15 ms just by experimenting, so far only on Linux and only on 1 desktop PC.
Ugh... maybe it's not only a timing problem. Not sure. Something might be really messed up on my end. Still working on this....
Sorry for so much noise. There is a race condition here, but it might be completely on my side! Looks like my thread for port enumeration might be crashing. If my program stops, very difficult to see if arduino-cli decided to terminate the process or if it crashed because of something I did wrong.
Still working on this. Slow progress. Difficult to troubleshoot....
No worries, better one more notification if it let you save one hour or two. Pluggable monitors/discoveries look simple from the outside, but they are very tricky to make reliable and high quality. We learned this lesson the hard way doing the built-in discoveries//monitors for serial and network-mdns...
Looks like my thread for port enumeration might be crashing
Thinking a bit more about this: is threading really worth it here? when the cli launches the monitor tool it will run the OPEN
command almost immediately (after HELLO
and DESCRIBE
maybe, but those commands should execute very quickly).
Honestly, I'm not sure if threads are the best solution here. It is sort of an experiment. My old pluggable serial monitor, which uses command line args for config, stdin / stdout for the main data channel, and stderr for very limited status updates is build around a single-threaded event dispatch loop which uses each platform's native async I/O APIs.
MacOS is the main problem. Apple has kernel-level bugs which can bring down the entire system, and the problem appears to get worse with faster Macs and their latest Monetery release. I've spent many unsuccessful hours on it. My main opinion is using the MacOS runloop to service a combination of IOServiceAddMatchingNotification for device discovery and CFFileDescriptorEnableCallBacks for data flow is just too risky. So a big part of my motivation to move device enumeration to its own thread is not based on how I want to structure my code, but instead to avoid these terrible MacOS bugs by using the runloop and Core Foundation APIs only on 1 thread and only for tracking device connect & disconnect events.
Windows is also problematic, in that stdin & stdout are implemented with WIN32 anonymous pipes which doesn't support any async APIs like "overlapped" IO. So on Windows 2 extra threads are always required anyway.
On arduino-cli, a minor feature that would really help me and maybe others who later try to create pluggable monitors would be verbose messages about whether the monitor tool process was terminated by arduino-cli or crashed on its own and arduino-cli notices it is no longer alive.
Seeing only "stopped decode loop" without any idea of why it stopped makes troubleshooting difficult.
Let me see if I can improve the error message
PR here: https://github.com/arduino/arduino-cli/pull/1638 test builds here: https://github.com/arduino/arduino-cli/actions/runs/1745117422
I now have pluggable monitor working on Linux, MacOS and Windows.
Here's a couple simple things that might help others in the future.
Adding an eventType which doesn't do anything, but causes its message to print to arduino-cli's stdout might help people who use printf-style troubleshooting. You can't just use printf() because arduino-cli parses everything as JSON. I ended up writing a function to append to a log file which I watched with "tail -f" in another window. It worked well enough, though matching up the lines from 2 separate windows isn't as intuitive as seeing the printed message at the point it occurred in the stream of other messages.
A reminder that backslash needs special handling in JSON might be nice in the documentation. At one point I had the port label showing a Windows device instance, which begins with "\\?\". The error messages arduino-cli prints with malformed JSON aren't always intuitive.
Anyway, happy to report I finally got it working on Windows.
@cmaglie - Does IDE 2.0 nightly build use the latest arduino-cli nightly? Initial testing looks like it's using pluggable discovery but not yet using pluggable monitor. Should I wait for an update, or open an issue now?
Does IDE 2.0 nightly build use the latest arduino-cli nightly?
I don't think it's using directly the latest nightly, AFAIR the version of the cli is pinned and updated from time to time to ensure that IDE and CLI talk the same gRPC API.
Initial testing looks like it's using pluggable discovery but not yet using pluggable monitor. Should I wait for an update, or open an issue now?
This is because the IDE is still using the old (non-pluggable) monitor gRPC API. We are tracking this issue here: https://github.com/arduino/arduino-ide/issues/769.
BTW, if the pluggable monitor works fine on the arduino-cli
in text mode, there is a high probability that it will work on the ArduinoIDE 2.0 too once implemented.
Ok, sounds like I should wait.
I can confirm the latest IDE nightly build never runs my pluggable monitor (which logs stuff to a file every time it runs... nothing gets written to the file).
Pluggable monitor is definitely working with arduino-cli running in a terminal. Well, at least on Linux. I'm also seeing a strange intermittent problem on Windows. Always works if I run in verbose mode with "-v". Works about 50/50 without verbose mode. Still uncertain if it's a bug in my code. Needs more investigation...
Now that Pluggable Discovery is wrapped at at version 1, let's talk of a Pluggable Serial Monitor, and perhaps a Pluggable Visualization Monitor.
With Pluggable Serial Monitor, the IDE would run program which communicates with the target hardware using its own protocol, and then in turn communicates a stream of bytes to the IDE as if the IDE was using serial or network protocol.
With Pluggable Visualization Monitor, the IDE would run a program which communicates with the target hardware using its own protocol, where that program opens a window or other GUI and manages all of its update to the GUI.