cloudflare / cloudflared

Cloudflare Tunnel client (formerly Argo Tunnel)
https://developers.cloudflare.com/cloudflare-one/connections/connect-apps/install-and-setup/tunnel-guide
Apache License 2.0
8.75k stars 773 forks source link

Built-in Docker Healthcheck #1126

Open firecow opened 8 months ago

firecow commented 8 months ago

Describe the feature you'd like I would like the docker image to contain curl or wget so it's possible to create a docker healthcheck that docker-compose or docker swarm can use.

It would be nice the healthchecking is just native to the cloudflare/cloudflared image like so.

FROM cloudflare/cloudflared:2023.10.0 as cloudflared
FROM alpine:3.18.4

COPY --from=cloudflared /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
COPY --from=cloudflared /usr/local/bin/cloudflared /usr/local/bin

ENV TUNNEL_METRICS="localhost:2000"

HEALTHCHECK --interval=5s --retries=6 --timeout=3s CMD wget -q ${TUNNEL_METRICS}/ready -O -

If building from alpine is something you don't want to do, then provide a cloudflare tunnel health command that can be used like so

HEALTHCHECK --interval=5s --retries=6 --timeout=3s CMD cloudflare tunnel health --ready-endpoint localhost:2000/ready

or like so

services:
  cloudflared:
    image: cloudflare/cloudflared
    environment:
      TUNNEL_METRICS: "localhost:2000"
    healthcheck:
      command: cloudflared tunnel health --ready-endpoint ${TUNNEL_METRICS}/ready
      interval: 5s
      retries: 6
      timeout: 3s

Describe alternatives you've considered Build my own image built on-top of the cloudflare/cloudflared image or use third-party images built on top of cloudflare's

mordilloSan commented 8 months ago

I second this. Would be very helpful when using auto heal.

Letgamer commented 8 months ago

Using the following Dockerfile could be working:

FROM tarampampam/curl as curl

FROM erisamoe/cloudflared
COPY --from=curl /bin/curl /bin/

Credits: https://github.com/sidevesh/cloudflared-with-health-check

maggie44 commented 8 months ago

I think that PR is helpful, although it may also be worth reevaluating whether the current base-debian11 containers still make sense as well. If the goal is to have a minimal container, it may as well be scratch. The Dockerfile (https://github.com/cloudflare/cloudflared/blob/master/Dockerfile) mentions # use a distroless base image with glibc but I'm not sure why glibc would be required for the Golang compiled binary.

However, I would find a more helpful base image to be busybox. It contains some basic tools that would help with debugging. I am currently running Cloudflared on a Kubernetes cluster and for some reason it isn't seeing my service which it is supposed to redirect to. It would be really helpful if I could get in to that container and be able to ping my service endpoint to help with debugging my Kubernetes infrastructure. I tend to favour busybox over Alpine as it is even more minimal than Alpine, and subsequently has very few updates. As a maintainer you often find yourself constantly bumping the Alpine container version for no reason just to try and keep up.

There is a case to be made that a scratch container has additional security benefits, but it's a really hard case to make. If someone has got as far as being able to send exec commands to your busybox container, it's game over either way.

CodaBool commented 3 months ago

Using the following Dockerfile could be working COPY --from=tarampampam/curl /bin/curl /bin/

No one is going to use copy curl from a unreputable image like that. But the general method is right. It is better however, to use wget for something like this. It's much smaller in size (~1Mb when using busybox's version).

Here is an example

FROM gcr.io/distroless/java21-debian12:nonroot
COPY --from=busybox:stable-glibc /bin/wget /usr/bin/wget

If you wanted to save 500Kb-1Mb and add security the alternative is add healthcheck Go code to the repo. This could either be a separate static binary with a http call. Or add onto the CLI with a flag, like how watchtower does it (looking at this a second time, it seems watchtower is actually just checking if another watchtower process is running, which is not really a healthcheck). It would need to follow the docker spec and do exit 0 on healthy. Exit 1 on unhealthy.

I have 2 PRs I did recently with other dockerfiles that expand the app to have a healthcheck feature. So, those could be looked as general examples (but none of them are in go, they are in rust, and in node+python)

If the goal is to have a minimal container, it may as well be scratch

debian distroless is fine. No need to change off of it. Tons of Go projects work this way.

maggie44 commented 3 months ago

A good rule of thumb with Docker images, is to put in what you need and no more. Distroless image is far more than is needed. Not a deal breaker or concern, but unnecessary if it's a pure go build.

wget a good option if space saving is a concern.

Also a tiny-curl version, but would suggest building from source rather than importing packages from others: https://github.com/maggie44/docker-tiny-curl

Or, as mentioned above, make the base image busybox or Alpine which includes wget, more secure and stable than building and copying in your own, and is 4mb (busybox decompressed) instead of distroless image's 18mb even before copying in wget.

firecow commented 3 months ago

All of this is irrelevant, if the cloudflared binary could execute a command to check the /ready endpoint

https://github.com/cloudflare/cloudflared/pull/1135 We need this merged, to close this issue.

kamaradski commented 2 months ago

I'm testing the curl method:

FROM tarampampam/curl as curl
FROM erisamoe/cloudflared
COPY --from=curl /bin/curl /bin/

in AWS ECS like this:

resource "aws_ecs_task_definition" "ecs_cloudflared_task_dev" {
  family                   = "cloudflare-tunnel"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = 512
  memory                   = 1024
  execution_role_arn       = aws_iam_role.ecs_task_execution_role_dev.arn
  task_role_arn            = aws_iam_role.ecs_task_execution_role_dev.arn
  tags                     = module.meta_ecs_cloudflared_task_dev.tags

  container_definitions = jsonencode([
    {
      name      = module.meta_ecs_cloudflared_task_dev.name
      image     = "${aws_ecr_repository.ecr_cloudflare_tunnel_dev.repository_url}:latest",
      essential = true
      command = [
        "tunnel",
        "--metrics", "0.0.0.0:3333",
        "--no-autoupdate",
        "run",
        "--token", cloudflare_tunnel.cloudflare_tunnel_dev.tunnel_token
      ]

      logConfiguration = {
        logDriver = "awslogs"
        options = {
          "awslogs-group"         = module.meta_cloudwatch_loggroup_dev.name
          "awslogs-region"        = local.base.region
          "awslogs-stream-prefix" = "cloudflare-tunnel"
        }
      }

      portMappings = [
        {
          containerPort = 3333
          hostPort      = 3333
          protocol      = "tcp"
        }
      ]

      # https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#container_definition_healthcheck
      healthCheck = {
        command     = ["CMD-SHELL", "curl -f http://localhost:3333/metrics || exit 1"],
        interval    = 10 # Interval in seconds between health checks
        timeout     = 10 # Timeout in seconds for each health check
        retries     = 3  # Number of retries before marking the container as unhealthy
        startPeriod = 10 # Optional: grace period to allow the container to start
      }
    }
  ])
}

But the healthcheck is not working in ECS. (it does locally... :shrug: ). So i'm really waiting for https://github.com/cloudflare/cloudflared/pull/1135 to get released and have a proper standard solution.