Open ozdanborne opened 8 years ago
This is a task for Mesos: https://issues.apache.org/jira/browse/MESOS-4427. There's nothing we can really do to ensure it is reachable, unless you have any suggestions.
@sargun I agree with that this is a problem that should be addressed in core Mesos, but Avinash does not.
I have already made two suggestions to fix this - Since the only IPs in Mesos that are unreachable are Docker IPs, I suggested in this issue that Mesos DNS should not respond respond with NetworkInfo for a Docker Bridge Task.
In the linked JIRA, I suggested we don't fill NetworkInfo for IP-per-container tasks at all. But again, Avinash believes that's an incorrect solution.
I know this is not a simple problem, and it may not have a simple solution, but I'd appreciate help in designing one moving forward, and would be happy to contribute to it. Otherwise, CNI or any ip-per-container solution becomes impossible to trial alongside regular networking tasks.
@djosborne Do you need this on all records? Can we do it just for the basic task A records?
@djosborne Does this work for your needs:
core@ip-10-0-5-115 /opt/mesosphere/packages/navstar--cd7f7caa227728e5ac52f8c4823e1c5b0b2b22d4 $ dig foo.marathon.agentip.dcos.thisdcos.directory
; <<>> DiG 9.10.2-P4 <<>> foo.marathon.agentip.dcos.thisdcos.directory
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39737
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;foo.marathon.agentip.dcos.thisdcos.directory. IN A
;; ANSWER SECTION:
foo.marathon.agentip.dcos.thisdcos.directory. 5 IN A 10.0.2.4
;; Query time: 2 msec
;; SERVER: 198.51.100.1#53(198.51.100.1)
;; WHEN: Fri Jun 03 04:46:03 UTC 2016
;; MSG SIZE rcvd: 89
core@ip-10-0-5-115 /opt/mesosphere/packages/navstar--cd7f7caa227728e5ac52f8c4823e1c5b0b2b22d4 $ dig foo.marathon.containerip.dcos.thisdcos.directory
; <<>> DiG 9.10.2-P4 <<>> foo.marathon.containerip.dcos.thisdcos.directory
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39314
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;foo.marathon.containerip.dcos.thisdcos.directory. IN A
;; ANSWER SECTION:
foo.marathon.containerip.dcos.thisdcos.directory. 5 IN A 172.17.0.2
;; Query time: 2 msec
;; SERVER: 198.51.100.1#53(198.51.100.1)
;; WHEN: Fri Jun 03 04:46:08 UTC 2016
;; MSG SIZE rcvd: 93
The behavior requested in this issue is that:
With one Mesos-DNS configuration setting, Docker bridge tasks return the agent IP, and ip-per-container tasks return their NetworkInfo.Ip
.
@sargun I'm not sure if your dig response shows this behavior. Can you provide more info on the use case you're showing, specifically:
netinfo
? Or have you introduced some new IPsrc
?To answer your previous question, I suppose this is only important for A records. Definitely not for SRV records. And as far as I know, Mesos-DNS doesn't support any other records, right?
foo.marathon.containerip.dcos.thisdcos.directory. - Returns IP from network infos vs. foo.marathon.agentip.dcos.thisdcos.directory. - returns agent IP.
Do we really need containerip in the name? can't foo.marathon.dcos.thisdcos.directory attempt to use networkinfo.ipaddress, and if not found just default to the same ip as used by the agentip name?
On Fri, Jun 3, 2016 at 1:56 PM, Sargun Dhillon notifications@github.com wrote:
foo.marathon.containerip.dcos.thisdcos.directory. - Returns IP from network infos vs. foo.marathon.agentip.dcos.thisdcos.directory. - returns agent IP.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mesosphere/mesos-dns/issues/426#issuecomment-223648553, or mute the thread https://github.com/notifications/unsubscribe/ACPVLKmK19ZWA7r6Aba-eWmFLLXkC-oSks5qIGrhgaJpZM4IfiFa .
@jdef every container has networkinfo.ipaddress regardless of whether or not it has ip per container or not.
Oh, I didn't notice the containerip
/agentip
in those requests.
@sargun I don't believe this is a useful solution, as the logic of which of those two address to to use is now pushed to the running microservices / framework's.
If we are going to solve this problem at the service layer, I'd propose adding a new IPSrc
configuration option possibly called accessible-ip
, which will return the ContainerIP for ip-per-container tasks, and will skip for docker bridge tasks. There is still the question of how Mesos-DNS will determine which case for a task. I spoke with Avinash at MesosCon, and he suggested that if container.docker.portMappings
was set, its a inaccessible task. And if ipAddress.discoveryInfo
was set, it was.
Thoughts?
Question: Who is the best person to decide whether to use the container IP or the agent IPs + portmapping?
Good question. It is arguable whether it should be decided by core Mesos or decided at the service layer as we are discussing. On one hand, Mesos is reporting enough information for Service Discovery services to tell which address the container should be reachable on (as I mentioned in my last comment). On the other hand, maybe that calculation should already be done in core Mesos.
Shall we continue / close out that discussion on the Jira issue before implementing it here at the service layer?
My guess is that getting Core Mesos to implement a good general purpose solution is going to be hard, if not impossible. I say that after some discussion, and that there are no good heuristics for the "right" answer. The other problem is that the answer can be based on the viewer. For example, if there is an overlay, and a container on the overlay is trying to contact another container, it may prefer to go over the overlay as opposed to going to the agent IP directly.
I think if we come up with a good heuristic here, we can eventually help Core Mesos implement it.
@sargun @jdef as discussed, let's add an experimental "auto"
configuration to Mesos DNS which applies the following logic when returning entries:
if task.port_mappings or not task.network_info.ip_address:
return agent_ip
else:
return task.networkinfo.ip_address
Mesos DNS' own documentation describes this issue:
This unfortunately means that IP-per-container tasks can't be used alongside standard Mesos tasks without breaking DNS resolution for one or the other. This makes it very difficult to introduce any limited IP-per-container support without breaking networking for standard tasks. This will affect CNI as well, when it is introduced in Mesos 0.29.
I raised this problem in the Mesos JIRA, but it has been brought up that this is really a service discovery issue, as Mesos is reporting information as expected
Therefore, I poropose that, when Mesos DNS is configured to prefer networkinfo, arbitrarily skip it when resolving Docker Bridge Tasks, continuing to the next "
IPSources
" field instead.