aseba-community / aseba

Aseba is a set of tools which allow beginners to program robots easily and efficiently. To contact us, please open an issue.
http://aseba.wikidot.com
GNU Lesser General Public License v3.0
48 stars 62 forks source link

Aseba Studio quietly aborts upon bad message #638

Open ypiguet-epfl opened 7 years ago

ypiguet-epfl commented 7 years ago

If the payload of an Aseba message is smaller than what's specified in its header, the application terminates with a not-so-friendly error message written to stderr. This also happens with other errors; search terminate() in aseba/common/msg/msg.cpp. In Aseba Studio, the application quits unexpectedly without visible error message.

stephanemagnenat commented 7 years ago

Yes, this should never happen. A basic assumption of the Aseba protocol is that packets are valid.

ypiguet-epfl commented 7 years ago

Unsure if you mean the current situation is fine or should be improved. In my case, during development, I had invalid packets where more obvious error messages would have helped. Just displaying the error message in an alert before quitting would be great.

stephanemagnenat commented 7 years ago

All terminate() calls in msg.cpp first display an error. See lines 167, 218, 278, 295, and 347 of https://github.com/aseba-community/aseba/blob/master/common/msg/msg.cpp.

ypiguet-epfl commented 7 years ago

Yes, that's what I meant. "display" is a bit optimistic for something written to stderr, especially in an application like Aseba Studio with a GUI. Maybe it's written in a log file somewhere, and in a terminal if it's launched manually in case the user really expects something there. I suggest that msg.cpp throws exceptions to allow proper error recovery or notifications by callers; and in Aseba Studio, at least an alert and the opportunity for the user to save her files.

stephanemagnenat commented 7 years ago

These terminate() should never be called in a deployed application, because they mean the Aseba protocol is violated. Deployed application should in all cases, including disconnection, follow the protocol. During development, I think that it is ok to redirect stderr to a file or a terminal.

But I also see your point. The reason I am not very motivated to use exceptions there is that I expect exceptions to be handled in the caller code. In general, I aim at keeping the code as simplest as possible, and so try to avoid adding more code paths. As these errors should never happen in correct code, I want to notify the developer and then stop the program.

Note that one can install a custom handler for std::terminate, therefore allowing to print a dialog box if relevant. One could imagine the program to be launched with cerr redirected to a file, and the handler to display the content of this file in the case of a GUI program.

I am interested in which context you encountered these assertions?

ypiguet-epfl commented 7 years ago

I found this when I developed my own target in node.js (I had a string whose length prefix was wrong, hence a mismatch between the message length and the decoded payload). My target violated the Aseba protocol, but it was with the current release of Aseba Studio.

stephanemagnenat commented 7 years ago

I see. For the aforementioned reasons, I am not very motivated to replace std::terminate with exceptions, but what we could do is to redirect stderr to a log file in a temporary directory with reopen, and set a handler with std::set_terminate that will show a message and the content of this file. What do you think?

ypiguet-epfl commented 7 years ago

I think exceptions help to make code simpler. Exceptions are already used elsewhere (e.g. in the compiler).

Here, just throwing an unhandled exception would cause a crash which would be more obvious than a message to stderr and a clean abort unnoticed by the operating system.

I don't know much about the application architecture and whether qt adds constraints; but catching exceptions at the appropriate level and displaying their what() message seems much simpler than redirecting stderr and managing an external file, probably with OS-dependent code.

davidjsherman commented 7 years ago

Could you provide a test program that we could include in the e2e testing?

stephanemagnenat commented 7 years ago

It is true that the compiler uses exceptions for internal compiler errors. I consider the message library as lower-level than the compiler, and to ease integration for third-parties, decided against using exceptions for fatal errors there. I agree that it is somewhat an arbitrary choice.

davidjsherman commented 7 years ago

Do you mean a node.js application to trigger the Aseba Studio termination? I've trimmed a little bit my current version which isn't ready for public consumption yet, but it's still 220 lines.

It depends on what we’re testing. I was interpreting issue #638 as an end-to-end interoperability problem, since one wouldn’t want a node.js client to be able to crash the other Aseba programs, and certainly not to do so and then not notice. We would want the client code to be able to detect when its peer has crashed. Your issue is that the current mechanism is hard to detect. To implement an improved mechanism, we need a test that provokes the problem.

If the goal is only to violate the low-level protocol and observe that programs terminate, then a four-line C++ program to unit test every application is enough.

But I understood that the goal is to improve client detection of peer termination. The business about raising an exception is just one way for the peer program to detect the protocol violation and do someting more friendly to inform the client. As Stéphane said, the contract for the Aseba protocol is strict about valid packets, so the responsibility is more on the client end.

So I would adapt your node.js program to 1) generate an invalid packet, then 2) fail to detect the crash. Then we would have a basis, a use case, for defining what a better mechanism would be (like watching log files). It could be shorter than 220 lines.

ypiguet-epfl commented 7 years ago

David, what triggered #638 was Aseba Studio quiet abortion upon communication error. I suspect communication errors are not totally uncommon; see for instance #619. So user-visible messages would be useful and point more clearly to the real issue. Here is a reduced node.js program terminate-aseba-studio.js which is fatal to Aseba Studio (tested in Ubuntu 16.04):

var net = require('net');
net.createServer(function (socket) {
        socket.on("data", function (data) {
                var msg = new ArrayBuffer(26);
                var dataView = new DataView(msg);
                dataView.setUint16(0, 20, true);
                dataView.setUint16(2, 12345, true);
                dataView.setUint16(4, 0x9000, true);
                socket.write(Buffer(msg));
        });
        socket.on("close", function () {});
        socket.on("error", function () {});
}).listen(33333);

Launch it with node terminate-aseba-studio.js, then start Aseba Studio and connect to tcp port 33333. Boom. My server continues running. If you start asebastudio from a terminal, you'll see an error message. The problem comes from an invalid description (0x9000) message where the payload size in the header (20) is larger than the decoded payload.

Crash detection of other nodes is a separate question, which could probably be solved with timeouts.

davidjsherman commented 7 years ago

What would you wish the behavior to be? I am guessing:

This behavior is testable.

Aseba doesn't have any framing, if I remember correctly, so once the stream in out of synch, it is corrupted forever. The only thing to do is drop the connection. Of course one could try to reopen the connection and thus restore sync, but that is a policy question for clients.

If you agree, then perhaps this issue could be renamed Aseba connection should be dropped if stream is corrupted and you could add an issue to your code Detect dropped Aseba connection.

(Thank you for the example, I need to figure out where e2e testing should go since tests/e2e-http/ is gone.)

stephanemagnenat commented 7 years ago

If the basic contract of the Aseba protocol (the messages are sane and in order) is violated, Studio must quit because the state of the network is unknown and Studio's internals depend in many places on a sane network, even if the connection is reconnected (re-connection assumes implicitly that the same network is reconnected, and will not work if Studio is connected to a different network).

Studio might be allowed to do an emergency auto-save though, voting in favor using exceptions, although that could also be implemented with std::set_terminate.

ypiguet-epfl commented 7 years ago

David: I'd wish (i) and (ii). The server could eventually detect that the tcp connection has been closed, but that really wasn't my point.

Stéphane: Studio remains open reliably (I think) upon other types of broken connections; I fail to see a major difference here. I'd rather have c++ exceptions for that because they can be more descriptive and caught in a more flexible way, which is especially important in applications not written yet, not only in Studio. msg.cpp is located in aseba/common, not aseba/clients/studio: that's an invitation to reuse it. But it's an implementation issue. My main concern is about Studio quitting unexpectedly, which to most people looks like a crash, hence a bug.

stephanemagnenat commented 7 years ago

Studio remains open reliably (I think) upon other types of broken connections;

When there is a broken connection, Studio assumes that it is a killed switch or a disconnected Thymio, hence that if this target becomes available again (same Dashel target), the corresponding network will be the same. This is why it does not show the connection dialogue again. It is actually quite fishy, and one can confuse/crash Studio by changing the network behind its back through a disconnection/re-connection. Making Studio robust towards this behavior would require a significant rewrite.

Again, the reason of not using exceptions in msg is that the common library of Aseba terminates if the contract is broken. I see your point regarding the exception and re-usability. Regarding users, if no node is broken on the network (the current released Thymio firmware is, but will be fixed in the next release which should come soon), they should not be affected by the termination.

davidjsherman commented 7 years ago

Studio must quit because the state of the network is unknown

I think I have missed something quite important, then. I thought that the Aseba message bus was a snowflake topology with point-to-point packet-forwarding links, where loss of sync would only invalidate the future stream of packets on that one link.

stephanemagnenat commented 7 years ago

@davidjsherman you are correct. But in Studio, if the Dashel target is disconnected, Studio will attempt reconnecting. If re-connection to this target succeeds, Studio assumes that nodes with similar Id represent the same physical nodes as before (with same native variables and functions, etc.). This is a somewhat dangerous assumption, because in case of re-connection to a TCP target, a totally different network could be present (for instance a different set of robots in Playground), so that Studio can be "tricked" into producing bytecodes incompatible with the target nodes. This is bad but was a pragmatic choice in order to support the disconnection/re-connection of wired Thymios with a limited amount of work. I'm not very happy about it, but it's the current state.

ypiguet-epfl commented 7 years ago

Just got an unexpected crash (an unhandled exception) in Aseba Studio/Ubuntu 16.04 while editing an Aseba program. /var/log/syslog contains this: Aug 24 11:19:01 mobots-ThinkPad-T440p asebastudio.desktop[16746]: Qt has caught an exception thrown from an event handler. Throwing Aug 24 11:19:01 mobots-ThinkPad-T440p asebastudio.desktop[16746]: exceptions from an event handler is not supported in Qt. You must Aug 24 11:19:01 mobots-ThinkPad-T440p asebastudio.desktop[16746]: reimplement QApplication::notify() and catch all exceptions there. Aug 24 11:19:01 mobots-ThinkPad-T440p asebastudio.desktop[16746]: terminate called after throwing an instance of 'QtConcurrent::UnhandledException' Aug 24 11:19:01 mobots-ThinkPad-T440p asebastudio.desktop[16746]: what(): std::exception A message describing the original exception would be more useful.