Open neilmfrench opened 1 year ago
So if I merged #100 it will have issues can anyone confirm it causes issues?
Are you using persistent volumes ? normally this issue can happen when you are mounting a config.yaml
@LaurenceJJones These values work for me using the chart as it is with 1.4.5. I have not tested anything past that or tried manually setting the chart image.
values:
container_runtime: containerd
agent:
acquisition:
- namespace: network
podName: ingress-nginx-controller-*
program: nginx
env:
- name: COLLECTIONS
value: "crowdsecurity/nginx"
- name: PARSERS
value: "crowdsecurity/cri-logs"
More to the point of the original issue is I don't have any problems with cscli commands and lapi crashing with these almost factory settings.
@LaurenceJJones yes, I am.
Here's my values:
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: crowdsec
spec:
values:
container_runtime: containerd
image:
tag: "v1.4.6"
tls:
enabled: true
secrets:
username: "${SECRET_CROWDSEC_AGENT_USERNAME}"
password: "${SECRET_CROWDSEC_AGENT_PASSWORD}"
config:
parsers:
s02-enrich:
whitelist-external-ip.yaml: |
name: crowdsecurity/whitelists
description: "Whitelist my external ip"
whitelist:
reason: "My external ip"
ip:
- "${SECRET_EXTERNAL_IP}"
- "${SECRET_GCP_IP}"
lapi:
env:
- name: ENROLL_KEY
value: "${SECRET_CROWDSEC_ENROLL_KEY}"
- name: ENROLL_INSTANCE_NAME
value: "homelab-k8s-cluster"
- name: ENROLL_TAGS
value: "k8s linux homelab"
- name: BOUNCER_KEY_traefik
value: "${SECRET_CROWDSEC_TRAEFIK_BOUNCER_KEY}"
- name: BOUNCER_KEY_cloudflare
value: "${SECRET_CROWDSEC_CLOUDFLARE_BOUNCER_KEY}"
- name: LEVEL_DEBUG
value: "true"
dashboard:
enabled: true
assetURL: https://crowdsec-statics-assets.s3-eu-west-1.amazonaws.com/metabase_sqlite.zip
ingress:
enabled: true
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-production"
traefik.ingress.kubernetes.io/router.entrypoints: "websecure"
ingressClassName: "traefik"
host: &host "crowdsec-dash.${SECRET_DOMAIN}"
tls:
- hosts:
- *host
secretName: "crowdsec-dash-tls"
persistentVolume:
data:
enabled: true
storageClassName: ceph-block
size: 1Gi
config:
enabled: true
storageClassName: ceph-block
size: 500Mi
agent:
persistentVolume:
config:
enabled: true
accessModes:
- ReadWriteMany
storageClassName: ceph-filesystem
size: 400Mi
env:
# - name: DISABLE_ONLINE_API
# value: "true"
# - name: USE_FORWARDED_FOR_HEADERS
# value: "true"
- name: COLLECTIONS
value: >-
crowdsecurity/linux
crowdsecurity/sshd
crowdsecurity/traefik
crowdsecurity/base-http-scenarios
crowdsecurity/http-cve
crowdsecurity/whitelist-good-actors
- name: PARSERS
value: "crowdsecurity/cri-logs"
acquisition:
- namespace: networking
podName: traefik-*
program: traefik
metrics:
enabled: true
serviceMonitor:
enabled: true
On a completely fresh install (no existing persistent volumes), this will fail if I set tag = v1.5.2. However, with tag = v1.4.6 it works.
I did a little digging when troubleshooting, it seemed to be related to https://github.com/crowdsecurity/crowdsec/blob/master/docker/docker_start.sh#L276 so I disabled the config pvc. That let me get a little further in the startup but it still eventually failed with the same chown error.
My next approach was to do a clean install with v1.4.6 and then upgrade to v1.5.2. This allowed me to get past the chown errors, but when I tried to perform any cscli command, even cscli --help, the lapi pod would crash. I do not have any cscli issues on v1.4.6.
I did indeed observe the same issues after uninstalling crowdsec and forcing image v1.5.2
so I did a little investigation and I noticed k8s was OOM killing the agents. So I upped the default resource of 100Mi memory limit and i'm no longer getting crashing lapi or agent pods when using the cscli command.
Configuration used:
---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: crowdsec
spec:
interval: 30m
chart:
spec:
chart: crowdsec
version: 0.9.6
sourceRef:
kind: HelmRepository
name: crowdsec
namespace: flux-system
interval: 30m
values:
container_runtime: containerd
image:
tag: "v1.5.2"
agent:
acquisition:
- namespace: network
podName: ingress-nginx-controller-*
program: nginx
env:
- name: COLLECTIONS
value: "crowdsecurity/nginx"
- name: PARSERS
value: "crowdsecurity/cri-logs"
resources:
limits:
memory: 512Mi
requests:
cpu: 150m
memory: 256Mi
renaming issue due to release of 1.5.3
A clean install of the chart will not work with 1.5.2. Will produce errors trying to chown the sqlite db which does not exist yet.
Upgrading from 1.4.X will allow the app to start, but any cscli command crashes the lapi pod.