AcademySoftwareFoundation / OpenCue

A render management system you can deploy for visual effects and animation productions.
https://www.opencue.io
Apache License 2.0
832 stars 202 forks source link

RQD assumes it's hostname is reachable by Cuebot #1510

Open KernAttila opened 2 months ago

KernAttila commented 2 months ago

Describe the bug

Hi all ! When launched, RQD sends a host report to Cuebot stating its hostname. But Cuebot cannot reach the RQD machine back (and doesn't know it). In most scenarios it works, but on my setup, machines are able to communicate only via a host nickname, they cannot see the other machines via their local hostname. (I'm using NordVPN meshnet feature to emulate a local network, I guess it should behave the same on other VPN solutions.)

To Reproduce (with all machines on nordvpn meshnet, or any vpn I guess)

  1. Clear host list on cuebot
  2. Launch RQD -> host appears in cuecommander
  3. Lock the machine via cuecommander -> error, cannot communicate
  4. Host cannot receive jobs.

Under the hood

  1. RQD sends a host report with its hostname.
  2. Cuebot saves/updates the host and its stats in the database.
  3. RQD continues to send reports saying its alive.
  4. Cuebot thinks the machine is available but does not test.

Expected behavior Send a hostname that Cuebot can reach. Suggestion: do a handshake

  1. On launch, RQD sends all known hostnames to Cuebot.
  2. Cuebot pings back each one and uses the first that responds.
  3. Cuebot sends a "gotcha, here's your reachable hostname" to RQD
  4. RQD saves that value internally and uses it for its next reports.
  5. Cuebot can now reach the RQD host.

Version Number Dev

lithorus commented 2 months ago

Have you tried using the RQD_USE_IP_AS_HOSTNAME in rqd.conf?

KernAttila commented 2 months ago

Thanks @lithorus, this worked. Cuebot can now identify the proper host running RQD using its IP address. Though it would be nice to be able to keep the identifier tied to the hostname in such scenario. IP is ok, but often the hostnames convey meaning in an infrastructure. Do you think it would be worth the little compute overhead to have this feature ?

Maybe it could be tested only if there is no deliberate overrides in rqd.conf, like RQD_USE_IP_AS_HOSTNAME=False and OVERRIDE_HOSTNAME is not set.

The full story is that I stumbled on this issue while working on a tray icon for OpenCue and I need to make sure I can reach the machine properly through the server, so I had to implement such logic and thought it could be a useful addition to the RQD codebase.

DiegoTavares commented 2 months ago

I think using the RQD_USE_IP_AS_HOSTNAME feature is a work around this issue, but I'm not against a new feature with the proposed handshake mechanism. This being sad, I'm changing the status of this issue to feature request.

When implementing this, please keep in mind the possibility of a future design where rqd will not directly interact with cuebot using grpc, but use a queueing mechanism (eg. Kafka), and Cuebot will continue to interact directly with Rqd using grpc.

lithorus commented 2 months ago

Are there any plans on doing single direction connection instead of bi-directional communication between cuebot and rqd? That would also solve it (and also make a network admin happy)..