Resolvers value not being used in ansible scripts #5

pkaramol commented 5 years ago

Here is my dcos.yml file:

  download: "https://downloads.dcos.io/dcos/stable/dcos_generate_config.sh"
  version: "1.12.1"
  # version_to_upgrade_from: "1.12.0"
  # image_commit: "acc9fe548aea5b1b5b5858a4b9d2c96e07eeb9de"
  enterprise_dcos: false
  selinux_mode: permissive

    # This is a direct yaml representation of the DC/OS config.yaml
    # Please see https://docs.mesosphere.com/1.12/installing/production/advanced-configuration/configuration-reference/
    # for parameter reference.
    cluster_name: "dcos-dev-mycompany"
    security: strict
    bootstrap_url: http://bootstrap-dcos.mycompany.local:8080
    exhibitor_storage_backend: static
    master_discovery: static
    ip_detect: ens33
    dns_search: mycompany.local

However, whenever the bootstrap script runs, the resolvers always keep being and

[root@master01 default]# cat /etc/resolv.conf
# Generated by gen_resolvconf.py. Do not edit.
# Change configuration options by changing DC/OS cluster configuration.
# This file must be overwritten regularly for proper cluster operation around
# master failure.

options timeout:1
options attempts:3

pkaramol commented 5 years ago

It seems this is caused by the gen_resolvconf.py script which keeps running (on my first deployment I had forgotten to set the resolvers variable properly)

How can I remedy this?

MrMarvin commented 5 years ago

Hi @pkaramol, thanks a lot for your report!

From your description of the issue and the linked ansible group_vars dcos.yml, it seems that your configuration does not allow to run a Mesosphere DC/OS config upgrade. Did you see an error message when running Ansbile the second time?

I suspect there was a message like

TASK [DCOS.bootstrap : generate DC/OS upgrade files] ***************************
skipping: []

and than later, more prominently

TASK [DCOS.master : Upgrade: Download dcos_node_upgrade.sh] ********************
fatal: []: FAILED! => {"changed": false, "dest": "/tmp/dcos/1.12.1/upgrade_from_1.12.1/", "gid": 0, "group": "root", "mode": "0755", "msg": "Request failed", "owner": "root", "response": "HTTP Error 404: Not Found", "secontext": "unconfined_u:object_r:user_tmp_t:s0", "size": 6, "state": "directory", "status_code": 404, "uid": 0, "url": ""}

which indicates that the new config could not be rolled out. If this is the case, 'commenting in' the version_to_upgrade_from variable will fix that. In your case it should be safe to set equal to version: 1.12.1.

pkaramol commented 5 years ago

Hi @MrMarvin thanks. However I tried your suggestion and it did not work;

What I did was to manually edit the the file from which the dcos-gen-resolvconf.service took its env vars (i.e. EnvironmentFile=/opt/mesosphere/etc/dns_config) and set the RESOLVERS to the desired value. This was done on all master/agent nodes.

lloesche commented 5 years ago

Files in /opt/mesosphere should not be modified as any changes will get overwritten on upgrades. The best way would be to do an "upgrade" from 1.12.1 to 1.12.1 (version: "1.12.1", version_to_upgrade_from: "1.12.1") where the only change is the updated DNS.