alexellis / faas-containerd

containerd and CNI provider for OpenFaaS
https://blog.alexellis.io/faas-containerd-serverless-without-kubernetes/
MIT License
61 stars 10 forks source link

Removing a function with active replicas (task running) fails #28

Open carlosedp opened 4 years ago

carlosedp commented 4 years ago

Trying to remove a function that is currently running (has a RUNNING task), fails.

After trying to remove again, it succeeds.

Expected Behaviour

Function gets removed with first command.

Current Behaviour

Running function:

❯ faas store deploy figlet --name=figlet2
WARNING! Communication is not secure, please consider using HTTPS. Letsencrypt.org offers free SSL/TLS certificates.

Deployed. 200 OK.
URL: http://localhost:8081/function/figlet2

❯ sudo ctr -n openfaas-fn container ls
CONTAINER    IMAGE                                RUNTIME
figlet2      docker.io/functions/figlet:0.13.0    io.containerd.runc.v2

❯ sudo ctr -n openfaas-fn task ls
TASK       PID     STATUS
figlet2    7252    RUNNING

Deploy logs:

Jan 17 16:42:43 debian10 faas-containerd[6669]: 2020/01/17 16:42:43 [Update] request: {"service":"figlet2","image":"functions/figlet:0.13.0","network":"","envProcess":"figlet","envVars":{},"constraints":[],"secrets":[],"labels":{},"annotations":{},"limits":null,"requests":null,"readOnlyRootFilesystem":false}
Jan 17 16:42:43 debian10 faas-containerd[6669]: 2020/01/17 16:42:43 [Update] service figlet2 not found
Jan 17 16:42:43 debian10 faas-containerd[6669]: 2020/01/17 16:42:43 [Deploy] request: {"service":"figlet2","image":"functions/figlet:0.13.0","network":"","envProcess":"figlet","envVars":{},"constraints":[],"secrets":[],"labels":{},"annotations":{},"limits":null,"requests":null,"readOnlyRootFilesystem":false}
Jan 17 16:42:43 debian10 faas-containerd[6669]: 2020/01/17 16:42:43 Deploy docker.io/functions/figlet:0.13.0 size: 5658006
Jan 17 16:42:43 debian10 faas-containerd[6669]: 2020/01/17 16:42:43 Container ID: figlet2        Task ID figlet2:        Task PID: 7252
Jan 17 16:42:43 debian10 faas-containerd[6669]: 2020/01/17 16:42:43 figlet2 has IP: 10.62.0.163.
Jan 17 16:42:43 debian10 faas-containerd[6669]: 2020/01/17 21:42:43 Version: 0.13.0        SHA: fa93655d90d1518b04e7cfca7d7548d7d133a34e
Jan 17 16:42:43 debian10 faas-containerd[6669]: 2020/01/17 21:42:43 Read/write timeout: 5s, 5s. Port: 8080
Jan 17 16:42:43 debian10 faas-containerd[6669]: 2020/01/17 21:42:43 Writing lock-file to: /tmp/.lock
Jan 17 16:42:43 debian10 faas-containerd[6669]: 2020/01/17 21:42:43 Metrics server. Port: 8081

Trying to remove:

❯ faas-cli remove figlet2
Deleting: figlet2.
Server returned unexpected status code 500 error deleting container figlet2, figlet2, cannot delete running task figlet2: failed precondition

Logs:

Jan 17 16:44:16 debian10 faas-containerd[6669]: 2020/01/17 16:44:16 [Delete] request: {"functionName":"figlet2"}
Jan 17 16:44:16 debian10 faas-containerd[6669]: 2020/01/17 16:44:16 [Delete] removing CNI network for figlet2
Jan 17 16:44:16 debian10 faas-containerd[6669]: 2020/01/17 16:44:16 [Delete] removed figlet2 with namespace /proc/7252/ns/net and ID figlet2-7252
Jan 17 16:44:16 debian10 faas-containerd[6669]: Status of figlet2 is: running
Jan 17 16:44:16 debian10 faas-containerd[6669]: 2020/01/17 16:44:16 Need to kill figlet2
Jan 17 16:44:16 debian10 faas-containerd[6669]: 2020/01/17 21:44:16 SIGTERM received.. shutting down server in 5s
Jan 17 16:44:16 debian10 faas-containerd[6669]: 2020/01/17 21:44:16 Removing lock-file : /tmp/.lock
Jan 17 16:44:21 debian10 faas-containerd[6669]: 2020/01/17 21:44:21 No new connections allowed. Exiting in: 5s
Jan 17 16:44:21 debian10 faas-containerd[6669]: 2020/01/17 16:44:21 [Delete] error removing figlet2, error deleting container figlet2, figlet2, cannot delete running task figlet2: failed precondition

Task gets stopped but container is not removed:

❯ sudo ctr -n openfaas-fn container ls
CONTAINER    IMAGE                                RUNTIME
figlet2      docker.io/functions/figlet:0.13.0    io.containerd.runc.v2

❯ sudo ctr -n openfaas-fn task ls
TASK       PID     STATUS

figlet2    7252    STOPPED

Running remove command again, removes:

❯ faas-cli remove figlet2
Deleting: figlet2.
Removing old function.

Logs:

Jan 17 16:45:29 debian10 faas-containerd[6669]: 2020/01/17 16:45:29 [Delete] request: {"functionName":"figlet2"}
Jan 17 16:45:29 debian10 faas-containerd[6669]: Status of figlet2 is: stopped
Jan 17 16:45:29 debian10 faas-containerd[6669]: 2020/01/17 16:45:29 Need to kill figlet2
Jan 17 16:45:29 debian10 faas-containerd[6669]: 2020/01/17 16:45:29 [Delete] deleted figlet2

Possible Solution

Steps to Reproduce (for bugs)

1. 2. 3. 4.

Context

Your Environment

go version

containerd -version

uname -a

cat /etc/os-release
alexellis commented 4 years ago

Hi, did you try what I explained on slack yet? The timeout for deletions is around 3s but the watchdog stays holding for "write_timeout" seconds.

You need to deploy with a value lower than that. So try 1s.

carlosedp commented 4 years ago

Yes, when deploying with --env write_timeout=1s the function gets removed correctly but by using the default (no write_timeout parameter) it fails.

alexellis commented 4 years ago

Great. So it's a timing problem. We can't wait indefinitely to delete a container so it needs to have a limit, maybe a bigger limit than what's there now.