Open lqf96 opened 2 years ago
hello @lqf96 thanks for reaching out. the branch seems good to us. could you please submit it as a pull request?
Hi @ioga , yes I can do it in the weekend... However I'm not sure if any tests are required, and where should I add those tests. Maybe I also need to add this functionality to the documents?
Hi @ioga, sorry for the super delay but I've just submitted a pull request for this functionality... If you're interested feel free to give it a review and let me know your suggestions. Thanks!
Problem
Per the explanation of https://github.com/determined-ai/determined/issues/906#issuecomment-664494066, Determined assumes no direct connectivity between the master and the container. Instead, exposed container ports are published to agent's external IP address. This causes connectivity problems when I try to deploy Determined by putting master, agents and containers into the same Swarm overlay network. When the container is ready, Determined master and agent derive wrong IP address and port of the container, and will return 502 Bad Gateway error when trying to proxy Notebook, Tensorboard or Shell services. Therefore, I'd like Determined to add support for flat network topology, where the master, agents and containers are assumed to all be directly reachable from each other without any forwarding.
Solution
I have experimented with a possible solution in my
direct-connectivity
branch. The idea is that an extra config item calleddirect_connectivity
is added to master config file. When this item is set totrue
, we assume the network topology is flat. In this case, workload containers will have their ports exposed but not published, and master and agents will connect to the containers by their original instead of forwarded IPs and ports. This approach seems to work at least for the JupyterLab, and I can refactor and rebase my fork and make a pull request if you deem it to be viable.