Closed ktf closed 2 years ago
@rbx @dennisklein can we get a release with this in? I really do not understand how come we see a (silent) failure when many ipc channels are created on a given box.
@ktf did you confirm that this patch fixes/reveals the source of the issue?
No, but I do not see how the message:
[331:internal-dpl-aod-spawner_t1]: [06:38:34][ERROR] failed to attach channel from_internal-dpl-aod-spawner_t1_to_eta-and-cls-histograms[0] (bind)
could otherwise be produced without any further message.
Ok. The changes lgtm. The current behavior makes no sense for IPC.
I guess what is happening is that something already is using the given address on that machine. Your changes should reveal it next time.
I suggest including something unique, like the session id in the channel address string. We now do this here, for example: https://github.com/FairRootGroup/FairMQ/blob/master/test/protocols/_push_pull.cxx#L28.
I suspect it's actually more subtle than that. The ipc path is formed with:
@<hostname>_<port_bound_by_the_driver>
where the port_bound_by_the_driver
is unique, but only within a container. I suspect that the namespace of Unix Domain Sockets starting with @
is actually shared across containers, hence the issues. @ironMann does it make sense? Do you know where to look for the documentation of the @
prefix?
@ktf it seems that @
(abstract namespace for unix sockets) has poor support in containers: https://stackoverflow.com/questions/38455283/docker-containers-share-unix-abstract-socket-or-dbus
Ok, that explains the issue, then.
This is released with v1.4.49.
In case IPC is used, we should not get the error, so better know about it.