gazebosim / gz-transport

Transport library for component communication based on publication/subscription and service calls.
https://gazebosim.org
Apache License 2.0
29 stars 37 forks source link

Random ports and Dockerized Ignition #166

Open joxoby opened 4 years ago

joxoby commented 4 years ago

There's a use-case in which we want to have an Ignition Gazebo instance running inside a Docker container and communicate with the host machine through the ignition-transport layer.

One of the problems is that the ports used by NodeShared are randomly chosen during construction. This makes it difficult to expose the ports to the host machine during the container startup, since these aren't known a priori.

Questions

Note: docker run has the option --network=host that will mount the host network stack on the container. While this approach solves the problem above, we would prefer to avoid it (it also adds other problems when running the GUI).

diegoferigo commented 4 years ago

The network drivers used by docker unfortunately never went along well with any robotic middleware (ROS, YARP, etc). I'm not surprised if also Ignition Gazebo shares the same limitations. Without removing the network isolation, the only possible workaround I'm aware of is opening a range of ports large enough and hoping that no ports are allocated outside that range.

With that being said, in my team we're running Ignition Gazebo in containers based on Ubuntu 18:04 for a while now (and since few days also on 20.04), and with host network we do not have experienced any GUI problem. If for you the network host modality is not a strict blocker, I'd suggest to go along with it and perhaps solve the GUI problems you faced. I know that few groups use a VPN as workaround, allowing to scale also to bigger cluster e.g. in Kubernetes. If you want to maintain the network isolation I fear this is the only workaround, that however centralizes all the traffic between the nodes in a single point that could become a bottleneck.

joxoby commented 4 years ago

The network drivers used by docker unfortunately never went along well with any robotic middleware (ROS, YARP, etc). I'm not surprised if also Ignition Gazebo shares the same limitations.

Since ignition-transport is based on ZMQ over TCP, I don't think that there should be any fundamental problems regarding Docker's network drivers.

Without removing the network isolation, the only possible workaround I'm aware of is opening a range of ports large enough and hoping that no ports are allocated outside that range.

We could also try restricting the range of permitted ports to, let's say, 100. That's a more manageable number and would make exposing them easier.

With that being said, in my team we're running Ignition Gazebo in containers based on Ubuntu 18:04 for a while now (and since few days also on 20.04), and with host network we do not have experienced any GUI problem.

That's interesting to hear. Adding the --network=host here will cause this error when trying to run with the GUI:

dbus[10]: The last reference on a connection was dropped without closing the connection. This is a bug in an application. See dbus_connection_unref() documentation for details.
Most likely, the application was supposed to call dbus_connection_close(), since this is a private connection.
  D-Bus not built with -rdynamic so unable to print a backtrace

If you want to maintain the network isolation I fear this is the only workaround, that however centralizes all the traffic between the nodes in a single point that could become a bottleneck.

For each ignition-transport instance, there's a NodeShared singleton that will open a total of 4 ports. Every Node instance is basically a wrapper around that singleton and will use the same 4 ports as the other ones. What I'm trying to say is that the traffic is already somewhat centralized.

I'm curious to see what your Docker configuration is, and if it's not different from the official one, our problem might be specific to our system.

joxoby commented 4 years ago

An implementation to restrict the range would look something like this:

void bindSocketToPortInRange(zmq::socket_t& _socket, const std::string& _ip, int _minPort, int _maxPort)
{
  int port = _minPort;
  while (true)
  {
    try
    {
      auto fullAddress = _ip + ":" + std::to_string(port);
      _socket.bind(fullAddress.c_str());
      return;
    }
    catch (...)
    {
      port++;
      if (port > _maxPort)
      {
        throw std::runtime_error("No available ports in specified range.");
      }
      continue;
    }
  }
}

We can add this feature via an environment variable IGN_PORT_RANGE, that when set, the transport will restrict the ports to the provided range. E.g.: IGN_PORT_RANGE=40000:40100

diegoferigo commented 4 years ago

Since ignition-transport is based on ZMQ over TCP, I don't think that there should be any fundamental problems regarding Docker's network drivers.

We could also try restricting the range of permitted ports to, let's say, 100. That's a more manageable number and would make exposing them easier.

Yes, I'm not saying that they do not work. The reality is that all of them use a dynamic allocation of ports and it fights with the default network isolation of docker. Opening wide ports range is a workaround with limitations (https://github.com/moby/moby/issues/14288).

Adding the --network=host here will cause this error when trying to run with the GUI

I'm not using the official docker images and the error you posted rings a bell even though I'm not sure where I stumbled upon it in the past. Could you try to use the --init flag? My configuration is a bit complicated because we use a big docker image as portable team-wise development environment with deps and IDEs (therefore is quite heavy), think of it as a docker-based VM :)

We can add this feature via an environment variable IGN_PORT_RANGE, that when set, the transport will restrict the ports to the provided range. E.g.: IGN_PORT_RANGE=40000:40100

I let the developers to chime in here, I'm not an expert of the code of ign-transport.

caguero commented 4 years ago

Before considering further changes in the code I'd like to verify that there's an actual issue:

  1. Did you verify that the issue is not related with having different partition names in the guest and in the host? The default partition name is created using a combination of the machine name and username. Unless you specify IGN_PARTITION manually in both sides, it's almost guaranteed that the default partition names will be different (and communication will not work).

  2. Did you see this tutorial?

https://ignitionrobotics.org/api/transport/9.0/relay.html

It looks like a simplified case of what you're trying to achieve and I remember being able to communicate using Ignition Transport with a Docker container.

Do you mind to verify these two aspects?

joxoby commented 4 years ago

Did you verify that the issue is not related with having different partition names in the guest and in the host? The default partition name is created using a combination of the machine name and username. Unless you specify IGN_PARTITION manually in both sides, it's almost guaranteed that the default partition names will be different (and communication will not work).

Yes, I'm taking care of this. Just to be clear, I'm able to communicate with the container by using the option --network=host.

Did you see this tutorial? https://ignitionrobotics.org/api/transport/9.0/relay.html

I tried that tutorial without success. I'm somewhat skeptical that the tutorial should work based on:

By default, when you create a container, it does not publish any of its ports to the outside world. To make a port available to services outside of Docker, or to Docker containers which are not connected to the container’s network, use the --publish or -p flag. This creates a firewall rule which maps a container port to a port on the Docker host.

https://docs.docker.com/config/containers/container-networking/#published-ports

Are you positive that it is working for you?

joxoby commented 4 years ago

Furthermore, the Docker network bridge (the configuration used by default) does not support multicast, so discovery won't work either. It seems to me that the only way to connect to inside the container while preserving network isolation (not using --network=host) is to create a macvlan network: https://docs.docker.com/network/macvlan/.

diegoferigo commented 4 years ago

That's interesting to hear. Adding the --network=host here will cause this error when trying to run with the GUI:


dbus[10]: The last reference on a connection was dropped without closing the connection. This is a bug in an application. See dbus_connection_unref() documentation for details.
Most likely, the application was supposed to call dbus_connection_close(), since this is a private connection.
 D-Bus not built with -rdynamic so unable to print a backtrace
I'm not using the official docker images and the error you posted rings a bell even though I'm not sure where I stumbled upon it in the past.

FYI I found the bell: https://github.com/moby/moby/issues/38442 :) I'm still using the workaround, not sure if it's still necessary (from your error, I suppose it is).

joxoby commented 4 years ago

Thanks, Diego. I ended up arriving at the same solution. Nevertheless, I will keep this issue open so we can clarify some of the Docker networking issues.

caguero commented 3 years ago

Furthermore, the Docker network bridge (the configuration used by default) does not support multicast, so discovery won't work either. It seems to me that the only way to connect to inside the container while preserving network isolation (not using --network=host) is to create a macvlan network: https://docs.docker.com/network/macvlan/.

When IGN_RELAY is set, the discovery layer forwards all discovery information to the relays via unicast. Also, the communication is bidirectional. When a relay receives a unicast communication, it saves the endpoint of the sender because it's used for sending future discovery updates.