ReSearchITEng / kubeadm-playbook

Fully fledged (HA) Kubernetes Cluster using official kubeadm, ansible and helm. Tested on RHEL/CentOS/Ubuntu with support of http_proxy, dashboard installed, ingress controller, heapster - using official helm charts
https://researchiteng.github.io/kubeadm-playbook/
The Unlicense
592 stars 102 forks source link

Node join ip replace regex error #92

Closed ghost closed 3 years ago

ghost commented 3 years ago

Hello I get this error when trying to run the playbook

fatal: [192.168.1.xxx]: FAILED! => {"changed": false, "module_stderr": "Shared connection to 192.168.1.xxx closed.
", "module_stdout": "
Traceback (most recent call last):
  File \"/home/anatol/.ansible/tmp/ansible-tmp-1606683277.6747334-186173704336525/AnsiballZ_replace.py\", line 102, in <module>
    _ansiballz_main()
  File \"/home/anatol/.ansible/tmp/ansible-tmp-1606683277.6747334-186173704336525/AnsiballZ_replace.py\", line 94, in _ansiballz_main
    invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)
  File \"/home/anatol/.ansible/tmp/ansible-tmp-1606683277.6747334-186173704336525/AnsiballZ_replace.py\", line 40, in invoke_module
    runpy.run_module(mod_name='ansible.modules.files.replace', init_globals=None, run_name='__main__', alter_sys=True)
  File \"/usr/lib/python3.8/runpy.py\", line 207, in run_module
    return _run_module_code(code, init_globals, run_name, mod_spec)
  File \"/usr/lib/python3.8/runpy.py\", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File \"/usr/lib/python3.8/runpy.py\", line 87, in _run_code
    exec(code, run_globals)
  File \"/tmp/ansible_replace_payload_mh52eo44/ansible_replace_payload.zip/ansible/modules/files/replace.py\", line 302, in <module>
  File \"/tmp/ansible_replace_payload_mh52eo44/ansible_replace_payload.zip/ansible/modules/files/replace.py\", line 272, in main
  File \"/usr/lib/python3.8/re.py\", line 221, in subn
    return _compile(pattern, flags).subn(repl, string, count)
  File \"/usr/lib/python3.8/re.py\", line 327, in _subx
    template = _compile_repl(template, pattern)
  File \"/usr/lib/python3.8/re.py\", line 318, in _compile_repl
    return sre_parse.parse_template(repl, pattern)
  File \"/usr/lib/python3.8/sre_parse.py\", line 1036, in parse_template
    addgroup(int(this[1:]), len(this) - 1)
  File \"/usr/lib/python3.8/sre_parse.py\", line 980, in addgroup
    raise s.error(\"invalid group reference %d\" % index, pos)
re.error: invalid group reference 21 at position 3
", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}

System,software and tools versions:

Ubuntu: 20.04 Focal
Python: 3.8.5
Ansible: 2.9.6 (installed with pip)
Kubeadm-playbook: 1.9.4

Python code throwing the error (/usr/lib/python3.8/sre_parse.py\", line 980)

def parse_template(source, state):
    # parse 're' replacement string into list of literals and
    # group references
    s = Tokenizer(source)
    sget = s.get
    groups = []
    literals = []
    literal = []
    lappend = literal.append
    def addgroup(index, pos):
        if index > state.groups:
            raise s.error("invalid group reference %d" % index, pos)
        if literal:
            literals.append(''.join(literal))
            del literal[:]
        groups.append((len(literals), index))
        literals.append(None)
    groupindex = state.groupindex
...

Playbook task and regex code

- name: replace master api server address to {{ master_name }} in the /etc/kubernetes/kubelet.conf
  replace:
    dest: /etc/kubernetes/kubelet.conf
    regexp: '(\s+)(server: https:\/\/)[A-Za-z0-9\-\.]+:'
    replace: '\1\2{{ master_name }}:'
    #backup: yes
  #when: proxy_env is defined and master is defined with fqdn in the inventory file (e.g. master.example.com)
  tags:
  - init
  notify:
  - Restart kubelet

Unfortunately my python regex skills aren't good enough to fix it in the short time I have right now. But I will update with a solution if I find one,

Thank you for your work and for making it open source!

github-actions[bot] commented 3 years ago

Your constructive feedback makes this project stronger! Thank you!' first issue

ReSearchITEng commented 3 years ago

Strange...

  1. Could it be that your /etc/kubernetes/kubelet.conf does not have the server section populated due to errors before this step?
  2. Indeed most of my deployments are on older versions of ansible&python.
  3. A quick fix is to simply comment out this replace task if you don't use a proxy (aka you are not in a corporate env which needs proxy to reach the internet).
ghost commented 3 years ago

It actually does replace the server ip value, or at least it is the correct one before the step errors out.I did replace the task with a simple Ansible pause, change/verify it manually, and then proceed further with the process.

The rest of the playbook worked without any errors, except some in the addons.yml. I had to comment out all, except the calico cni and ingress-nginx and install the rest manually (), but that is mostly because of the continuous updates and the constantly changing ecosystem of both helm, and the subsequent charts.

Anyway, I got it to work with minor effort, and it works great.

Thank you very much , it helped me a lot (also maybe consider having a donate/buy a coffee/donate cryptocurrency section, some people might use it ;) )

If you have any questions or clarifications please write, otherwise I will close this issue.

ghost commented 3 years ago

(for completness)

Solution (temporary fix)

  1. Comment out the task actions
    replace:
    dest: /etc/kubernetes/kubelet.conf
    regexp: '(\s+)(server: https:\/\/)[A-Za-z0-9\-\.]+:'
    replace: '\1\2{{ master_name }}:'
  2. Add new task to pause the playbook,
    
    pause:
    prompt: Please confirm ...!  Press enter/return to continue. Press Ctrl+c and then "a" to abort

3 . From different terminal do the task manually (change the server ip in `/etc/kubernetes/kubelet.conf`)
4. Press `Enter` to resume the playbook
ReSearchITEng commented 3 years ago

It actually does replace the server ip value, or at least it is the correct one before the step errors out.I did replace the task with a simple Ansible pause, change/verify it manually, and then proceed further with the process.

This is very strange... I guess is something related to ansible&python versions.

The rest of the playbook worked without any errors, except some in the addons.yml. I had to comment out all, except the calico cni and ingress-nginx and install the rest manually (), but that is mostly because of the continuous updates and the constantly changing ecosystem of both helm, and the subsequent charts.

Did you take the latest version? I verified and updated all the charts and repos. If you have the logs, let me know which ones. Again -> please confirm it was latest version (master or release 1.19)

Anyway, I got it to work with minor effort, and it works great.

Thank you very much , it helped me a lot (also maybe consider having a donate/buy a coffee/donate cryptocurrency section, some people might use it ;) )

If you have any questions or clarifications please write, otherwise I will close this issue.

github-actions[bot] commented 3 years ago

Stale issue

dronov commented 2 years ago

JFYI it happens when you have defined [primary-master] section in hosts file as ip address, not fqdn. Maybe there is need to add some validation on it.