Closed jussisallinen closed 7 years ago
the sdc-docker image version as installed by sdcadm is master-20161121T212616Z-g2c4c35. CoaL installation had working sdc-docker with master-20161116T014617Z-gecd3409. Update channel was dev.
Note that the ipfilter-related dmesg
log messages you see in the global zone are an unrelated, but known, issue: OS-4332.
@jclulow Thanks for the info!
@jussisallinen So, there's a 3-part story of woe here.
Part 1: sdc-docker/node-moray are currently using cueball without a "binder bootstrap". This means that they are leaking all their DNS lookups out into public DNS, and accepting responses from public DNS.
Part 2: You seem to have a CNAME for *.fi-espoo-....company.com
which points at 1.company.com
. This is unfortunate, as it means that public DNS is going to provide a valid response for any lookups of internal SDC names (which is something we strongly recommend not to do).
Part 3: cueball has a bug (https://github.com/joyent/node-cueball/issues/53) which means it is not handling NODATA responses on CNAME'd names properly.
All 3 parts of this combine to create your crash -- the DNS SRV request for moray.fi-espoo-...company.com
leaks into public DNS, then public DNS answers with a CNAME to a name that has no SRV records (so it gets a NODATA response), and then cueball fails to interpret this response correctly. This will happen inconsistently, though, because sometimes the local binder in the DC will respond to the query more quickly than the public DNS server you have set, and that response will be taken instead.
So, part 3 is a bug, and I'll get on a fix for that ASAP. Part 1 is unfortunate, and I'll see if I can work with the other developers here to get that sorted out. For part 2, I really do recommend that you remove that CNAME if you can. If you absolutely need to put names under that suffix in public DNS, please make sure they don't collide with the names of SDC internal services and CNAME them specifically (not a *
record).
@arekinath Thanks for your analysis! I was also thinking that it might be the wildcard DNS entry that is causing the havoc here, part of.
Headnode platform: - SmartOS (build: 20161123T125110Z) Core dump can be found from here: core.node.90867
I did # sdcadm post-setup docker on fresh Triton Headnode.
Global Zone dmesg shows following when running # sdcadm post-setup docker:
2016-11-24T13:12:30+00:00 headnode svc.ipfd[2997]: [ID 139457 daemon.error] smf_get_state failed for svc:/TEMP/smartdc/dockerlogger:default: entity not found 2016-11-24T13:12:30+00:00 headnode svc.ipfd[2997]: [ID 162284 daemon.error] is_correct_event failed for svc:/TEMP/smartdc/dockerlogger:default. 2016-11-24T13:12:30+00:00 headnode svc.ipfd[2997]: [ID 662829 daemon.error] Service may have incorrect IPfilter configuration
When docker is trying to start in the docker0 Zone following get's logged in the Zones dmesg:
2016-11-24T13:12:46+00:00 8d3fb6f4-d359-45af-ba8f-90acab8cdc13 nscd[16077]: [ID 131150 user.error] nss_mdns: error checking svc:/network/dns/multicast:default service timestamp 2016-11-24T13:12:46+00:00 8d3fb6f4-d359-45af-ba8f-90acab8cdc13 nscd[16077]: [ID 131150 user.error] nss_mdns: error checking svc:/network/dns/multicast:default service timestamp
STATE STIME FMRI maintenance 13:13:06 svc:/smartdc/application/docker:default
[ Nov 24 09:20:34 Executing start method ("/opt/smartdc/docker/smf/method/docker start"). ]
FROM _toss (/opt/smartdc/docker/node_modules/assert-plus/assert.js:22:5) Function.out.(anonymous function) [as string] (/opt/smartdc/docker/node_modules/assert-plus/assert.js:122:17) parseConstructorArguments (/opt/smartdc/docker/node_modules/verror/lib/verror.js:76:18) new VError (/opt/smartdc/docker/node_modules/verror/lib/verror.js:153:11) CueBallDNSResolver.state_process (/opt/smartdc/docker/node_modules/moray/node_modules/cueball/lib/resolver.js:687:13) CueBallDNSResolver.FSM._gotoState (/opt/smartdc/docker/node_modules/moray/node_modules/cueball/node_modules/mooremachine/lib/fsm.js:273:4) CueBallDNSResolver.FSM._gotoState (/opt/smartdc/docker/node_modules/moray/node_modules/cueball/node_modules/mooremachine/lib/fsm.js:300:8) CueBallDNSResolver.FSM._gotoState (/opt/smartdc/docker/node_modules/moray/node_modules/cueball/node_modules/mooremachine/lib/fsm.js:300:8) CueBallDNSResolver.FSM._gotoState (/opt/smartdc/docker/node_modules/moray/node_modules/cueball/node_modules/mooremachine/lib/fsm.js:300:8) CueBallDNSResolver.FSM._gotoState (/opt/smartdc/docker/node_modules/moray/node_modules/cueball/node_modules/mooremachine/lib/fsm.js:300:8) FSMStateHandle.gotoState (/opt/smartdc/docker/node_modules/moray/node_modules/cueball/node_modules/mooremachine/lib/fsm.js:52:23) EventEmitter. (/opt/smartdc/docker/node_modules/moray/node_modules/cueball/lib/resolver.js:393:5)
emitTwo (events.js:87:13)
EventEmitter.emit (events.js:172:7)
onLookup (/opt/smartdc/docker/node_modules/moray/node_modules/cueball/lib/resolver.js:926:6)
/opt/smartdc/docker/node_modules/moray/node_modules/cueball/node_modules/mname-client/lib/client.js:130:5
DnsMessage. (/opt/smartdc/docker/node_modules/moray/node_modules/cueball/node_modules/mname-client/lib/client.js:218:3)
DnsMessage.g (events.js:260:16)
emitTwo (events.js:87:13)
DnsMessage.emit (events.js:172:7)
Socket. (/opt/smartdc/docker/node_modules/moray/node_modules/cueball/node_modules/mname-client/lib/sockets.js:359:7)
emitTwo (events.js:87:13)
Socket.emit (events.js:172:7)
UDP.onMessage (dgram.js:480:8)
[ Nov 24 09:20:36 Stopping because all processes in service exited. ]
[ Nov 24 09:20:36 Executing stop method (:kill). ]
[ Nov 24 09:20:36 Restarting too quickly, changing state to maintenance. ]`
Here's the metadata:
`{ "uuid": "566d6bc2-327f-49ce-a64f-387321672c54", "name": "docker", "application_uuid": "81f6a36c-2871-4340-9085-85ddae0e7a3b", "params": { "billing_id": "0ae33ebc-c216-11e2-9b84-6f7e2a82bc36", "image_uuid": "269cfa9a-b032-11e6-a6b7-cba1698b18f4", "archive_on_delete": true, "delegate_dataset": true, "maintain_resolvers": true, "networks": [ { "name": "admin" }, { "name": "external", "primary": true } ], "firewall_enabled": false, "tags": { "smartdc_role": "docker", "smartdc_type": "core" } }, "metadata": { "SERVICE_NAME": "docker", "SERVICE_DOMAIN": "docker.fi-espoo-.....company.com", "USE_TLS": true, "user-script": "#!/usr/bin/bash\n#\n# This Source Code Form is subject to the terms of the Mozilla Public\n# License, v. 2.0. If a copy of the MPL was not distributed with this\n# file, You can obtain one at http://mozilla.org/MPL/2.0/.\n#\n\n#\n# Copyright (c) 2014, Joyent, Inc.\n#\n\nexport PS4='[\D{%FT%TZ}] ${BASH_SOURCE}:${LINENO}: ${FUNCNAME[0]:+${FUNCNAME[0]}(): }'\n\nset -o xtrace\nset -o errexit\nset -o pipefail\n\n#\n# The presence of the /var/svc/.ran-user-script file indicates that the\n# instance has already been setup (i.e. the instance has booted previously).\n#\n# Upon first boot, run the setup.sh script if present. On all boots including\n# the first one, run the configure.sh script if present.\n#\n\nSENTINEL=/var/svc/.ran-user-script\n\nDIR=/opt/smartdc/boot\n\nif [[ ! -e ${SENTINEL} ]]; then\n if [[ -f ${DIR}/setup.sh ]]; then\n ${DIR}/setup.sh 2>&1 | tee /var/svc/setup.log\n fi\n\n touch ${SENTINEL}\nfi\n\nif [[ ! -f ${DIR}/configure.sh ]]; then\n echo \"Missing ${DIR}/configure.sh cannot configure.\"\n exit 1\nfi\n\nexec ${DIR}/configure.sh\n", "sapi-url": "http://10.65.0.27", "ENABLED_LOG_DRIVERS": "json-file,syslog,none" }, "type": "vm" }