coturn / coturn

coturn TURN server project
Other
11.08k stars 2k forks source link

Container goes into CrashLoopBackOff in a kubernetes cluster #994

Closed jattind closed 1 year ago

jattind commented 1 year ago

This is my first time deploying the coturn container in a kubernetes cluster. I was able to deploy and run an old container from instrumentisto/coturn, but wanted to move to an officially supported one.

The coturn container is going into a crashloop. I don't see any errors besides the following:

kubectl logs coturn-app-coturn-d7545fddc-zwqhc -n coturn

/usr/local/bin/docker-entrypoint.sh: line 8: /usr/bin/turnserver: Operation not permitted
/usr/local/bin/docker-entrypoint.sh: line 8: /usr/bin/turnserver: Success

I deployed it as a docker container on another machine to see what's in the docker-entrypoint.sh and I see less than 8 lines:

cat docker-entrypoint.sh 
#!/bin/bash

# If command starts with an option, prepend it with a `turnserver` binary.
if [ "${1:0:1}" == '-' ]; then
  set -- turnserver "$@"
fi

exec $(eval "echo $@")

Any help or pointers on how to debug and get this container running would be appreciated. Thanks in advance.

ggarber commented 1 year ago

Hi @jattind , Can you provide more information on coturn docker image you use, parameters passed when running it and what k8s spec you use to launch it? I can try to reproduce it and help.

jattind commented 1 year ago

I used the following docker image: 4.6.0-r0 Also I used the helm charts available here, except when deploying the container I modified the repository location from instrumentisto/coturn to coturn/coturn. I have tried passing different parameters from their values, but all seem to result in container going into crashloop, whereas theirs works and I was able to get video calls working.GitHub - iits-consulting/coturn-chart: Coturn Helm Chart to provide a STUN/TURN Server inside Kubernetes

Here are the parameters I changing in their Helm values:use-auth-secretstatic-auth-secret=northfingerprinttotal-quota=0bps-capacity=0stale-nonceno-multicast-peers repository: coturn/coturntag: 4.6.0-r0tls-listening-port=443 Remove:    lt-cred-mech I have already created a TLS issuer and secret secret certificate:  enabled: true  host:   issuerName: ca-issuer  secret: ca-key-pair I am in the process of creating Kubernetes spec., instead of using their Helm, which I hope can help me to debug the issue further. Jattin

On Saturday, September 24, 2022 at 01:14:21 AM PDT, Gustavo Garcia ***@***.***> wrote:  

Hi @jattind , Can you provide more information on coturn docker image you use, parameters passed when running it and what k8s spec you use to launch it? I can try to reproduce it and help.

— Reply to this email directly, view it on GitHub, or unsubscribe.

You are receiving this because you were mentioned.Message ID: @.***>

eakraly commented 1 year ago

Hi @jattind I see 2 important points to pay attention to:

  1. That helm chart runs turnserver behind LB - which means it fully proxies all the connections
  2. Containers runs without privilege to use privileged port range whereas you want it to use 443
    securityContext:
            capabilities:
              drop:
                - ALL

    I think bullet 2 could be a reason to why turnserver crashes (not a good thing I agree - should log at least)

Can you please remove requirement to use 443 and try again?

jattind commented 1 year ago

I removed the following but see the same issue. tls-listening-port=443

I also tried changing the config from a secret to a regular ConfigMap, but see the same issue. Lastly I removed all the config, so that it would run with what ever the defaults are, but still see the same issue. deployment.yaml.txt

I have converted the helm to bunch of Kubernetes specs so it's easier to understand and debug. Here is the deployment spec.

jattind commented 1 year ago

I continued debugging further and removed the following: ` securityContext: capabilities: drop:

The cotrun pod is able to start successfully! I will need to understand the implication of the above change, but I can make progress.

eakraly commented 1 year ago

Good callout - if you make filesystem read-only then turnserver cannot write any file That includes logs and that includes turnserver.pid file

We might want to address this in code - running with read-only filesystem is a great way to increase security (and having pid file in this case is useless anyway)

eakraly commented 1 year ago

@jattind could please try and find which of the 3 options you removed from the securityContext made it work?

jattind commented 1 year ago

I actually removed all 3, but I can test removing each one-at-a-time and see what causes the crash.

jattind commented 1 year ago

Quick question. Is it possible to have user/password as well as secret auth enabled at the same time? It seem to test the ICE servers it requires username/password whereas Nextcloud talk requires a secret for auth.

jattind commented 1 year ago

The crash issue seems to be caused by the following: ` capabilities: drop:

eakraly commented 1 year ago

OK, makes sense You requested 443 (unlike the helm chart provides) which requires capability NET_BIND_SERVICE Can you please try following (add NET_BIND_SERVICE if you want 443) :

capabilities:
            add:
            - NET_BIND_SERVICE
            drop:
            - ALL

And to confirm - readOnlyRootFilesystem: true is not an issue?

eakraly commented 1 year ago

Quick question. Is it possible to have user/password as well as secret auth enabled at the same time? It seem to test the ICE servers it requires username/password whereas Nextcloud talk requires a secret for auth.

I do not think so. Quick look into code tells me that there is A credential mechanism used (one).

eakraly commented 1 year ago

@jattind is this issue addressed? Can we close the issue then?

jattind commented 1 year ago

I will give this a try tomorrow and confirm.

Jattin

On Sep 27, 2022, at 6:20 PM, Pavel Punsky @.***> wrote:

 OK, makes sense You requested 443 (unlike the helm chart provides) which requires capability NET_BIND_SERVICE Can you please try following (add NET_BIND_SERVICE) :

capabilities: add:

  • NET_BIND_SERVICE drop:
  • ALL And to confirm - readOnlyRootFilesystem: true is not an issue?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

jattind commented 1 year ago

readOnlyRootFilesystem: true is not an issue.

Also adding the NET_BIND_SERVICE solve the crash issue. capabilities: add: