agent registering with wrong IP behind LoadBalancer/ with multiple networks

BiggA94 commented 3 years ago

If I have a Node with two networks attached, one private and one public, I can't tell the agent, on which IP address it should listen. Problem here: The agent is registered with the private IP. If I Change the IP in the Plugin in Blender, I can connect to it, but it gets automatically reset after a few seconds.

Proposal: Add an env var, with which I can override the IP Address, the agent is registering with.

zocker-160 commented 3 years ago

AFAIK this is a know issue, which is not exclusive to Docker, but also occurs, when running directly on a machine with multiple network interfaces.

That being said, have you tried running the container in bridged mode instead of host mode? This way you can force a specific ip using the --ip flag, but keep in mind, that you need to forward the ports 9000 - 9025 into the container.

Jeducious commented 3 years ago

Hi @BiggA94 :)

Ok, as of V0.2.0 we actually bind and listen to all network interfaces. So it will actually be listening on both. Also take heed of what @zocker-160 has mentioned about forwarding. The ports have also been changed as of V0.2.8 to 9669 - 9694.

hope this helps?

BiggA94 commented 3 years ago

That being said, have you tried running the container in bridged mode instead of host mode?

Yeah, that is essentially what kubernetes does, when you specify portBinding as HostPort. Currently this works, but in order for that, I had to open up the network to the public completely. But this is an unwanted security risk. (Not just a risk, there are actually some botnets that actively try to bruteforce all the services on all the ports) Problem, you can't set the LoadBalancer IP as reported IP.

@Jeducious This makes it a bit easier. The only Problem left is, that I can't manually tell the Crowdrender Client (The one running on the server) which IP it should report. Or at least, as far as I can see it. Correct me if I'm wrong, but I can't tell the Plugin in the local Blender (Where I am working with, on my own PC), which IP Address the Server has. If I try so, it is overwritten after a few seconds.

So here it would be nice, if I can tell the Server which IP it should use, or owerwrite it (despite the actual ip, docker is using). So this is a combination of an addition to the docker image and to the Plugin.

I hope you can understand my intention?

Jeducious commented 3 years ago

Hi @BiggA94 ok, I think I see the issue here. I could be mistaken though, so here goes....

If you are using the docker containers with your CR auth token, they'll be posting to the Crowdrender cloud and registering there. Our web app is designed to return the public address as seen by our web app to your client. Its odd that this isn't the address of the load balancer though? Normally if our web app gets a post from a node, it only sees the public IP, which I assume is the address of your load balancer, with the node actually behind that load balancer? If its true the loadbalancer IP is the publicly visible one, then we have a problem at our end maybe. Can you confirm?

As for the IP getting replaced, the ip you manually enter is overwritten, since every few seconds, the node list in Crowdrender is refreshed from your cloud account, which will overwrite the IP. You can avoid this by logging your client out which will not update it after that point. Though, you won't see if the machine is actually online or not. You'll need to verify that some other way, maybe by ssh?

There is no current way to ask the node that is running the CR headless server to report other IP addresses other than the public one. Though this would not be a difficult thing to implement.

Another method for connecting using a different IP to the one CR reports, is to create a new node and give it a nick name that doesn't match the host name of your node (or any node on your network). This gives you free reign to give the node any address you like since CR cannot resolve such a name to a host on the local network, nor update it from your cloud account. Then you can simply enter your own IP and connect. This avoids you having to log out, so you'll indeed see the node as being active in your account, but then you can use the nick name node to actually connect to it.

Bear in mind that last suggestion isn't one I have personally tried, so its untested at this point, but it is something you could try.

Let me know if this helps at all :)

BiggA94 commented 3 years ago

Okay, then I definitely would suggest that there are two possibilities for the headless:

Set IP address on the Server Side whith which it is registering on the web-app (override public IP)
Allow a Setting, where IP is not set automatically at all (and I can manually set it inside my plugin in Blender)

There are two reasons: At home I have a ipv6 only connection, where I don't have a public ipv4 IP (actually there is one, but it is shared between many..). So I can't use the nodes at home. On the other hand, there is my cloud-cluster. I don't know which IP it is actually registered, but I would like to be able to let kubernetes decide which IP it should use.

This gives you free reign to give the node any address you like since CR cannot resolve such a name to a host on the local network, nor update it from your cloud account. Then you can simply enter your own IP and connect. This avoids you having to log out, so you'll indeed see the node as being active in your account, but then you can use the nick name node to actually connect to it.

Wouldn't this also add the headless one, that is automatically created? And I currently tested it the following way: Start headless v28 on my local server, also using 28 locally in blender. The local node gets registered, and if I change the IP of it to the correct one, it starts uploading but sais "sync failed" afterwards (assuming, that ip is changing while uploading). But if I create a new node, and add the local IP of the server, it just says "connection failed".

If its true the loadbalancer IP is the publicly visible one, then we have a problem at our end maybe. Can you confirm?

No, the Loadbalancer is only controlling the ingress, not the egress of the cluster. So I don't really know, where that IP comes from.

Jeducious commented 3 years ago

Hi @BiggA94 Ok, if you are getting sync failed, then you are connecting, but then there is another issue stopping the synchronisation from working. A common problem for this is using two different versions of blender across your client/master and render nodes. Another more sneaky problem can come from port conflicts. In this scenario, if the file transfer mechanism cannot successfully transfer the file, it will fail and signal the node as sync failed.

Ok, I am a little confused of your network configuration though. Is it that your home network is configured to only use IPv6 or that your ISP assigns you an IPV6 address for your modem on their network? This may be the cause of the pain you're suffering, depending on where the ipV6 address is used. Since our code only uses IPV4 at the moment, this could be the reason you're having connection troubles. Having an idea of the topology of the network between you and the cluster would be a real help here.

Jeducious commented 3 years ago

Also forgot to mention, we recently changed the port numbers for V0.2.8. to avoid port conflicts with other services. The old 9000-9025 range is now 9669 - 9694. We observed that IANA's database shows other services had ports in the original range registered, so we moved to an unregistered range to avoid conflicts with known services.

So if you're using V0.2.8 and you've forwarded ports, please check they are the ones current for that version. I updated the readme for the cr_docker repo on GH yesterday when I found out we'd not yet updated that documentation. So apologies for any confusion there.

Jeducious commented 3 years ago

Ok, confirming that this is indeed a bug for users that are attempting to get this to work on a local network. I was able to setup a container to test using Docker for Desktops. What occurred for me (which might be different to the OP's situation) is that once the container starts, it posts its public IP to the CR web app. This public ip is refreshed every 10 seconds or so when the addon contacts our servers to find out what nodes are alive. At this point the public IP will be overwrite any ip entered manually, effectively stymying the user's efforts to change the ip to something else.

In my case, I needed to change the container based node's IP to its private network address, but it gets overwritten with the public IP since this is what our addon and web app do by design.

Containers were always imagined to be run in a cloud environment where the public endpoint resulted in a usable route to the node being used. However, when used on a local network the public endpoint is not able to be used this way without port mapping since it exists on the external side of a NAT (usually anyway, unless no NAT is used, but this is not the norm).

So, proposing a fix for this to allow the web app to at least temporarily store a list of candidate IPs, which will include the private network address for the node, so as to allow the addon to attempt to connect to each address, probably prioritising trying the private address first.

Welcome and C&C on this proposal.

Jeducious commented 2 years ago

This issue is back on the menu :) work to begin shortly on a fix to ensure that a container on a LAN can easily be connected to.

crowdrender / cr-docker

agent registering with wrong IP behind LoadBalancer/ with multiple networks #3