cicwi / SliceRecon

This repository has moved to:
https://cicwi.github.io/RECAST3D
GNU General Public License v3.0
6 stars 4 forks source link

ZeroMQ throws EFSM #2

Closed adriaangraas closed 5 years ago

adriaangraas commented 5 years ago

ZeroMQ occasionally throws EFSM error on the REQ/REP plugin socket, this is in the Visualization Server file.

The zmq_send() operation cannot be performed on this socket at the moment due to the socket not being in the appropriate state. This error may occur with socket types that switch between several states, such as ZMQ_REP. See the messaging patterns section of zmq_socket(3) for more information.

This probably means that either there was no REP on a REQ, or there is some kind of asynchronous access to the socket. I'm afraid it is the latter.

I will try to find a fix for this, but in case you have suggestions, please let me know.

jwbuurlage commented 5 years ago

For each request (SliceDataPacket), the plugin should send a reply (just a message containing of a single integer '1') indicating that it has received the packet. I am not sure what the default timeout period is, but since we observed that the plugin was overwhelmed with work (and/or sleeping) it is possible that it triggers a timeout on the recv which might lead to that error.

There is no 'asynchronous access' (or rather, simultaneous access by multiple threads) to the plugin socket, as its only use after construction has a lock guard:

    void send(const tomop::Packet& packet, bool try_plugin = false) {
        std::lock_guard<std::mutex> guard(socket_mutex_);
        ...
        if (try_plugin && plugin_socket_) {
            packet.send(plugin_socket_.value());
            plugin_socket_.value().recv(&reply);

        } 
        ...

We could consider using PUSH/PULL, for which it is okay if messages drop because of a busy client. The reason for using REP/REQ is that this is what was used in the pre-plugin system with RECAST3D. We could also make it configurable.

If you want, you can investigate further. Keep me up to date!

Thanks.

adriaangraas commented 5 years ago

This bit is a bypassing the lock (or not listening for a reply), and appears to be the culprit.

https://github.com/cicwi/SliceRecon/blob/ee0942394e51d0c5ae5d08be7ae4466f5f5898ee/include/slicerecon/servers/visualization_server.hpp#L176-L179

When the packet is send using the the local send method, the problem is gone. Is this the preferred method, or do we want the RemoveSlicePacket to be send to the plugin only?

adriaangraas commented 5 years ago

Could do something like


    void send_plugin(const tomop::Packet& packet) {
        std::lock_guard<std::mutex> guard(socket_mutex_);

        zmq::message_t reply;
        packet.send(plugin_socket_.value());
        plugin_socket_.value().recv(&reply);
    }
jwbuurlage commented 5 years ago

That's a bug :bug:! It should indeed only be sent to a possible plugin. I will fix it now.

jwbuurlage commented 5 years ago

Can you test if c621e29 solves the issue?

adriaangraas commented 5 years ago

I think you are missing a commit, the first argument of send should be a const tomop::Packet& and you are giving a zmq::socket_t&.

jwbuurlage commented 5 years ago

I think you are missing a commit, the first argument of send should be a const tomop::Packet& and you are giving a zmq::socket_t&.

Yes. I replaced c621e29 with edf12c5 in develop.

adriaangraas commented 5 years ago

Yup that works, but it will send the packet to RECAST3D if the plugin is not present. So, we can modify send or use send_plugin?

jwbuurlage commented 5 years ago

It will not, there is an if statement around it:

https://github.com/cicwi/SliceRecon/blob/edf12c5e7b5beffeeb97d20d8d6b506927e6f8ae/include/slicerecon/servers/visualization_server.hpp#L175-L177

I don't think it is necessary to introduce an extra function at this point, maybe if we introduce more packets that only get sent to plugins and not RECAST3D (although even in that case I would rather also send it along to RECAST3D as a ways of confirming that the entire pipeline has seen the packet).

Closing :)