Closed grussorusso closed 3 years ago
hi. What Kubernetes version (v1.18, v1.17, etc). are you running using kind? Our automated testing is currently covering v1.16, 1.17, and 1.18. We need to enable testing for 1.19 and 1.20 in travis-ci, but haven't gotten around to it yet...
Thanks for your reply. Indeed, kind automatically picked v1.20. Unfortunately, I get the same error with v1.18.15.
It worked for me last night using kind 0.10 on MacOS Docker Desktop (aka my laptop) and Kubernetes v1.18.5. But I realized that I deployed the latest chart from git, not the 1.0.0 chart from the helm repo. I will try that later tonight just to make sure it isn't some problem with the chart itself.
Probably not surprising, but on my MacOS / Docker Desktop, installing the 1.0.0 helm chart on kind 0.10 works. Here's the beginning snippet of the log from the init-couchdb job.
Daves-MacBook-Pro:kar dgrove$ kubectl logs jobs/owdev-init-couchdb -n openwhisk
Cloning into '/openwhisk'...
/openwhisk /
Note: checking out '1.0.0'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:
git checkout -b <new-branch-name>
HEAD is now at 2c621c07 fix start.sh to work on macos (#5019)
/
/openwhisk/ansible /
[WARNING]: Unable to parse /openwhisk/ansible/environments/local as an
inventory source
[WARNING]: No inventory was parsed, only implicit localhost is available
[WARNING]: provided hosts list is empty, only localhost is available. Note
that the implicit localhost does not match 'all'
PLAY [localhost] ***************************************************************
TASK [Gathering Facts] *********************************************************
Wednesday 17 February 2021 01:37:08 +0000 (0:00:00.175) 0:00:00.176 ****
ok: [localhost]
TASK [gen hosts if 'local' env is used] ****************************************
Wednesday 17 February 2021 01:37:09 +0000 (0:00:01.093) 0:00:01.269 ****
changed: [localhost -> localhost]
TASK [find the ip of docker-machine] *******************************************
Wednesday 17 February 2021 01:37:09 +0000 (0:00:00.752) 0:00:02.022 ****
skipping: [localhost]
TASK [get the docker-machine ip] ***********************************************
Wednesday 17 February 2021 01:37:09 +0000 (0:00:00.053) 0:00:02.076 ****
skipping: [localhost]
TASK [gen hosts for docker-machine] ********************************************
Wednesday 17 February 2021 01:37:10 +0000 (0:00:00.068) 0:00:02.144 ****
skipping: [localhost]
TASK [gen hosts for Jenkins] ***************************************************
Wednesday 17 February 2021 01:37:10 +0000 (0:00:00.082) 0:00:02.226 ****
skipping: [localhost]
TASK [check if db_local.ini exists?] *******************************************
Wednesday 17 February 2021 01:37:10 +0000 (0:00:00.084) 0:00:02.311 ****
ok: [localhost]
I'm not sure exactly what ansible
does in its initial gathering facts stage, but probably the thing to do is to try to run that pod interactively, execute the commands manually, and see if you can get a better error message.
Thanks again for checking. I followed your suggestion and tried executing the command on which Ansible fails (/bin/findmnt --list --noheadings --notruncate
) on the container via kubectl exec
. And it works without issues...
However, the pod eventually enters the Failed state with the same output.
At this point, I am even more confused about the problem. It is probably related to my own environment (maybe Docker version? I am using Docker 20.10.3 on Linux). I will verify if the same thing happens on a different Linux machine, as soon as I have some time to do so. Anyway, although annoying, the issue is not blocking for me as I managed to deploy OpenWhisk on Minikube.
I confirm everything works on a different Linux machine. So the issue is caused by something in my own configuration, although I haven't realized what exactly.
Thanks again for checking. I followed your suggestion and tried executing the command on which Ansible fails (
/bin/findmnt --list --noheadings --notruncate
) on the container viakubectl exec
. And it works without issues... However, the pod eventually enters the Failed state with the same output.At this point, I am even more confused about the problem. It is probably related to my own environment (maybe Docker version? I am using Docker 20.10.3 on Linux). I will verify if the same thing happens on a different Linux machine, as soon as I have some time to do so. Anyway, although annoying, the issue is not blocking for me as I managed to deploy OpenWhisk on Minikube.
I just came across the same problem. It turns the timeout is caused by Python rather than /bin/findmnt
. Below are some related upstream tickets:
https://github.com/ansible/ansible/issues/24228#issuecomment-409693926
https://bugs.python.org/issue1663329
https://bugs.python.org/issue11284
My system runs Arch Linux also, and inside the container ulimit -n
returns a large value. My workaround is to modify helm/openwhisk/configMapFiles/initCouchDB/initdb.sh
to apply this patch to /usr/local/lib/python2.7/dist-packages/ansible/plugins/shell/__init__.py
before using ansible.
It seems that the "bug" is also to use Python 2: OpenWhisk uses a Docker image of CouchDB 2.3.1, that is based on Debian Buster (slim) where the only variant of Python is Python 2...
I won't try to run OpenWhisk with CouchDB 3 because I have no idea what that would imply.
However, it means another valid workaround is to "fix" the environment where Ansible run (i.e., the CouchDB container when initializing the DB) by adding ulimit -n 4096
to the init script at "helm/openwhisk/configMapFiles/initCouchDB/initdb.sh". Tested and approved on roughly up-to-date Arch Linux.
Do you think this could be a valid fix to this problem, that could be merged? As a way to deal with a wart from the obsolete Python 2. I find it cleaner, clearer, and easier to implement than to patch some Ansible plugin file.
If still using Python 2, it's def better to move to v3 instead.
Sure, but as I said, using Python 2 comes from using CouchDB 2.3.1 Docker image. I don't know if OpenWhisk can work with v3, which hopefully is based on a more up-to-date Debian image.
However, it means another valid workaround is to "fix" the environment where Ansible run (i.e., the CouchDB container when initializing the DB) by adding
ulimit -n 4096
to the init script at "helm/openwhisk/configMapFiles/initCouchDB/initdb.sh". Tested and approved on roughly up-to-date Arch Linux.Do you think this could be a valid fix to this problem, that could be merged? As a way to deal with a wart from the obsolete Python 2. I find it cleaner, clearer, and easier to implement than to patch some Ansible plugin file
I came across the same issue on Arch Linux, this fix also worked for me.
I am trying to deploy OpenWhisk on my Arch Linux machine using kind. I have 2 worker nodes in the cluster and I have labelled them according to the official guide. I deploy OW using the official heml chart.
This is the output of
kubectl get pods -n openwhisk
:and
kubectl logs -n openwhisk owdev-init-couchdb-sqqhp
:I verified that the issue does not appear when using Minikube (using both Docker and containerd as container runtime), so I think the issue is somehow related to kind.
My
whisk.yml
configuration is identical to that shown in the guide for deploying OW on kind (except for the apiHostName, which I set as indicated).Thanks in advance for any hint