ChrisThePCGeek / k3s-ansible-traefik-rancher

Full ansible playbook to deploy k3s, traefik, and rancher
Apache License 2.0
66 stars 49 forks source link

traefik-ext dashboard 404 #8

Open rpchan44 opened 1 day ago

rpchan44 commented 1 day ago

Hi Good day!,

My external traefik is 404 while everything is working as expected, May i ask is this expected? thanks in advance this is really wonderful repository to make your own cluster :)

image image image image image

ChrisThePCGeek commented 1 day ago

make sure your ingressroute is right with the correct annotations. it should be similar to this.


apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: ext-dashboard
  annotations:
   kubernetes.io/ingress.class: traefik-external
  namespace: kube-system
  labels:
    app.kubernetes.io/name: traefik
    app.kubernetes.io/instance: traefik-external
spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`ext-dash.pcgeek.lan`)
      # middlewares:
      #   - name: authentik-auth
      #     namespace: kube-system
      kind: Rule
      services:
        - kind: TraefikService
          name: api@internal
  tls:
    secretName: ext-dash-tls
    certResolver: lab-ca
    domains:
    - main: ext-dash.pcgeek.lan```
ChrisThePCGeek commented 1 day ago

thats my production one, the only difference is I have added TLS and removed the insecure http port.

also I use my internal DNS to point to the external traefik IP for that. I dont access the dashboard over my public IP/DNS.

rpchan44 commented 1 day ago

dashboard.yaml.j2 @ /home/user/k3s-ansible-traefik-rancher/roles/traefik_external/templates

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: ext-dashboard
  annotations:
   kubernetes.io/ingress.class: traefik-external
  namespace: kube-system
  labels:
    app.kubernetes.io/name: traefik
    app.kubernetes.io/instance: traefik-external
spec:
  entryPoints:
    - web
    - websecure
  routes:
    - match: Host(`{{ traefik_ext_dash_dns_name }}`)
      kind: Rule
      services:
        - name: api@internal
          kind: TraefikService

They seems the same thanks for looking into this

rpchan44 commented 1 day ago
---
#k3s_version: v1.28.8+k3s1
k3s_version: v1.30.5+k3s1

ansible_user: adminuser
systemd_dir: /etc/systemd/system

# set your timezone
system_timezone: "Asia/Manila"

# interface which will be used for flannel
# debian is usually eth0, ubuntu could be either that or ens18, varies by OS. check with `ip -a` in terminal
flannel_iface: "ens18"

#retry count to check all nodes join cluster.  uncomment and set this to something higher than 20
#if your cluster doesn't all join up before the playbook times out
retry_count: 40

# apiserver_endpoint is virtual ip-address which will be configured on each master
apiserver_endpoint: "192.168.99.179"

# k3s_token is required  masters can talk together securely
k3s_token: "supersecretkey"

# The IP on which the node is reachable in the cluster.
# Here, a sensible default is provided, you can still override
# it for each of your hosts, though.
k3s_node_ip: '{{ ansible_facts[flannel_iface]["ipv4"]["address"] }}'

# Disable the taint manually by setting: k3s_master_taint = false
# switch which line is commented below to enable the taint on your masters when you have agent nodes
#k3s_master_taint: "{{ true if groups['node'] | default([]) | length >= 1 else false }}"

k3s_master_taint: false

# these arguments are recommended for servers as well as agents:
extra_args: >-
  --flannel-iface={{ flannel_iface }}
  --node-ip={{ k3s_node_ip }}

# change these to your liking, the only required ones are --no-deploy servicelb and --no-depoly traefik (this playbook deploys traefik)
# If you don't want to deploy traefik with helm afterwards and rather use the one packed with k3s, remove the --no-deploy traefik flag
# and set the var 'deploy_traefik: false' down below
# -----------------------
# 7-24-2022: added additional args for prometheus monitoring following Tim's tutorial on that, If you don't plan to do monitoring they can be removed
# "--kube-controller-manager-arg bind-address=0.0.0.0 --kube-proxy-arg metrics-bind-address=0.0.0.0 --kube-scheduler-arg bind-address=0.0.0.0 --etcd-expose-metrics true --kubelet-arg containerd=/run/k3s/containerd/containerd.sock"
extra_server_args: >-
  {{ extra_args }}
  {{ '--node-taint node-role.kubernetes.io/master=true:NoSchedule' if k3s_master_taint else '' }}
  --disable servicelb
  --disable traefik
  --tls-san {{ apiserver_endpoint }}
  --write-kubeconfig-mode 644
  --kube-controller-manager-arg bind-address=0.0.0.0
  --kube-proxy-arg metrics-bind-address=0.0.0.0
  --kube-scheduler-arg bind-address=0.0.0.0
  --etcd-expose-metrics true
  --kubelet-arg containerd=/run/k3s/containerd/containerd.sock

extra_agent_args: >-
  {{ extra_args }}

# image tag for kube-vip
kube_vip_tag_version: "v0.6.3"

# metallb type frr or native
metal_lb_type: "native"

# metallb mode layer2 or bgp
metal_lb_mode: "layer2"

# bgp options
# metal_lb_bgp_my_asn: "64513"
# metal_lb_bgp_peer_asn: "64512"
# metal_lb_bgp_peer_address: "192.168.30.1"

# image tag for metal lb
#metal_lb_frr_tag_version: "v7.5.1"
metal_lb_speaker_tag_version: "v0.13.12"
metal_lb_controller_tag_version: "v0.13.12"

# metallb ip range for load balancer
metal_lb_ip_range: "192.168.99.180-192.168.99.195"

# Only enable if your nodes are proxmox LXC nodes, make sure to configure your proxmox nodes
# in your hosts.ini file.
# Please read https://gist.github.com/triangletodd/02f595cd4c0dc9aac5f7763ca2264185 before using this.
# Most notably, your containers must be privileged, and must not have nesting set to true.
# Please note this script disables most of the security of lxc containers, with the trade off being that lxc
# containers are significantly more resource efficent compared to full VMs.
# Mixing and matching VMs and lxc containers is not supported, ymmv if you want to do this.
# I would only really recommend using this if you have partiularly low powered proxmox nodes where the overhead of
# VMs would use a significant portion of your available resources.
proxmox_lxc_configure: false
# the user that you would use to ssh into the host, for example if you run ssh some-user@my-proxmox-host,
# set this value to some-user
proxmox_lxc_ssh_user: root
# the unique proxmox ids for all of the containers in the cluster, both worker and master nodes
proxmox_lxc_ct_ids:
  - 112
  - 113
  - 114
  - 115
  - 116

#deploy traefik? this deploys both an internal (default ingress) and external instance of traefik, external = traefik-external ingressClass
deploy_traefik: true

#first IP from above metalLB range which will be used by traefik
#--IMPORTANT-- This IP NEEDS to be contained in the above pool provided to metalLB.  Usually I use the first one in that range
# internal and external param respectively - port forward from your firewall to the external instance
traefik_int_endpoint_ip: "192.168.99.180"
traefik_ext_endpoint_ip: "192.168.99.181"

#set this in your local DNS server (ie. Pihole, or pfsense, etc.) pointing to the IP from the line just above.
traefik_int_dash_dns_name: "traefik.storm.internal"
traefik_ext_dash_dns_name: "traefik-ext.storm.internal"

#number of traefik pods you want running
traefik_replicas: 1

#deploy rancher?
deploy_rancher: true
#number of replicas you want for rancher's pods
rancher_replicas: 1

#rancher dns name
rancher_dns_name: "k3s-control.storm.internal"

#version of cert-manager to deploy
cert_manager_ver: "v1.13.2"

#set this to true and put your ca cert and internal-ca issuer info in the variables below
use_internal_ca: false

issuer_email: "cert-manager@yourdomain.lan"
issuer_server_addr: "https://ca.yourdomain.lan/directory"

internal_ca_cert: |
  -----BEGIN CERTIFICATE-----
  <cert data here>
  -----END CERTIFICATE-----

# Only enable this if you have set up your own container registry to act as a mirror / pull-through cache
# (harbor / nexus / docker's official registry / etc).
# Can be beneficial for larger dev/test environments (for example if you're getting rate limited by docker hub),
# or air-gapped environments where your nodes don't have internet access after the initial setup
# (which is still needed for downloading the k3s binary and such).
# k3s's documentation about private registries here: https://docs.k3s.io/installation/private-registry
custom_registries: false
# The registries can be authenticated or anonymous, depending on your registry server configuration.
# If they allow anonymous access, simply remove the following bit from custom_registries_yaml
#   configs:
#     "registry.domain.com":
#       auth:
#         username: yourusername
#         password: yourpassword
# The following is an example that pulls all images used in this playbook through your private registries.
# It also allows you to pull your own images from your private registry, without having to use imagePullSecrets
# in your deployments.
# If all you need is your own images and you don't care about caching the docker/quay/ghcr.io images,
# you can just remove those from the mirrors: section.
custom_registries_yaml: |
  mirrors:
    docker.io:
      endpoint:
        - "https://registry.domain.com/v2/dockerhub"
    quay.io:
      endpoint:
        - "https://registry.domain.com/v2/quayio"
    ghcr.io:
      endpoint:
        - "https://registry.domain.com/v2/ghcrio"
    registry.domain.com:
      endpoint:
        - "https://registry.domain.com"

  configs:
    "registry.domain.com":
      auth:
        username: yourusername
        password: yourpassword

# Only enable and configure these if you access the internet through a proxy
# proxy_env:
#   HTTP_PROXY: "http://proxy.domain.local:3128"
#   HTTPS_PROXY: "http://proxy.domain.local:3128"
#   NO_PROXY: "*.domain.local,127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16"
rpchan44 commented 1 day ago

I haven't touch anything beyond all.yml that's why i'm fairly confuse as to why this wasn't working for me, again thanks for the time I don't know if this matter, all master and worker nodes is on ubuntu 22.04.4

adminuser@template:~/k3s/k3s-ansible-traefik-rancher/inventory/my-cluster/group_vars$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.4 LTS
Release:        22.04
Codename:       jammy
ChrisThePCGeek commented 1 day ago

No problem. at this level the OS it's on really doesn't matter. To be honest I don't know why its giving you an error. its definately getting to traefik because thats a traefik 404, the 404 generally means it's not matching any routes. If it matched but the service was down or not configured right you'd get a 502 bad gateway. did you try the domain name in the address bar in browser without adding the /dashboard to the end? that's how I pull up mine and then it adds it itself.

rpchan44 commented 1 day ago

http://traefik-ext.storm.internal There was a redirect 302 redirect to /dashboard and display that thing i even try to use curl -k -L same thing :)

ChrisThePCGeek commented 1 day ago

DNS is configured properly as well? such that traefik-ext points to the external IP for traefik and not the first one? in your case ending .181 from your vars file

ChrisThePCGeek commented 1 day ago

just trying to see what could possibly be the cause, other thing I can think of is enabling the debug logs and then watching the logs from the rancher UI while you try to navigate to the dashboard...or spin up an nginx test deployment and see if that works as intended. it could be something stupid with traefik itself

rpchan44 commented 1 day ago

Yep this was the entry on my hosts file on windows machine accessing the cluster

# Copyright (c) 1993-2009 Microsoft Corp.
#
# This is a sample HOSTS file used by Microsoft TCP/IP for Windows.
#
# This file contains the mappings of IP addresses to host names. Each
# entry should be kept on an individual line. The IP address should
# be placed in the first column followed by the corresponding host name.
# The IP address and the host name should be separated by at least one
# space.
#
# Additionally, comments (such as these) may be inserted on individual
# lines or following the machine name denoted by a '#' symbol.
#
# For example:
#
#      102.54.94.97     rhino.acme.com          # source server
#       38.25.63.10     x.acme.com              # x client host

# localhost name resolution is handled within DNS itself.
#   127.0.0.1       localhost
#   ::1             localhost

192.168.99.180 k3s-control.storm.internal
192.168.99.180 traefik.storm.internal
192.168.99.181 traefik-ext.storm.internal
rpchan44 commented 1 day ago

I already try spinning up nginx deployment and service on the example directory hoping the dashboard for the external traefik will come to life, nada ahhaha still 404 and it's weird