crowdsecurity / helm-charts

CrowdSec community kubernetes helm charts
MIT License
27 stars 33 forks source link

Permission denied on sqlite database when use TLS #114

Open slefol opened 1 year ago

slefol commented 1 year ago

Hi, I have noticed a issue with the dashboard when tls.enabled is set to true.

Environment

Helm chart : crowdsec Helm chart version : 0.9.9

$ helm install \
    crowdsec crowdsec/crowdsec \
    --create-namespace \
    --namespace crowdsec \
    -f crowdsec-values.yaml

crowdsec-values.yaml:

container_runtime: containerd
tls:
  enabled: true
  bouncer:
    reflector:
      namespaces: ["traefik"]
agent:
  tolerations:
    - key: node-role.kubernetes.io/control-plane
      operator: Equal
      effect: NoSchedule
  # Specify each pod whose logs you want to process
  acquisition:
    # The namespace where the pod is located
    - namespace: traefik
      # The pod name
      podName: traefik-*
      # as in crowdsec configuration, we need to specify the program name to find a matching parser
      program: traefik
  env:
    - name: PARSERS
      value: "crowdsecurity/cri-logs"
    - name: COLLECTIONS
      value: "crowdsecurity/traefik"
    # When testing, allow bans on private networks
    - name: DISABLE_PARSERS
      value: "crowdsecurity/whitelists"
  persistentVolume:
    config:
      enabled: false
lapi:
  dashboard:
    enabled: true
    ingress:
      host: dashboard.local
      enabled: true
  persistentVolume:
    config:
      enabled: false
  env:
    # For an internal test, disable the Online API
    - name: DISABLE_ONLINE_API
      value: "true"
Issue

In dashboard > Browse data > Cdrodsec > Alerts [SQLITE_CANTOPEN] Unable to open the database file (unable to open database file)

Investigation

In Admin Settings > Databases > Crowdsec > Save changes /metabase-data/crowdsec.db (Permission denied)

The dashboard is launched with the metabase user who does not have rights to the database file.

$ kubectl -n crowdsec exec -it crowdsec-lapi-7c79988958-q89ln -c dashboard -- sh
/ # ps faux
PID   USER     TIME  COMMAND
    1 metabase  1:30 java -XX:+IgnoreUnrecognizedVMOptions -Dfile.encoding=UTF-8 -Dlogfile.path=target/log -XX:+CrashOnOutOfMemoryError -server -jar /app/metabase.jar

/ # readlink /metabase-data/crowdsec.db
/var/lib/crowdsec/data/crowdsec.db

/ # ls -lh /var/lib/crowdsec/data/crowdsec.db
-rw-r-----    1 root     root       84.0K Oct 13 07:19 /var/lib/crowdsec/data/crowdsec.db

Change group ownership of the database file fixes the issue

/ # chown :metabase /var/lib/crowdsec/data/crowdsec.db
slefol commented 4 months ago

Hi, I installedHelm chart version : 0.11.0 and the issue is still present. Can you please look into this issue ?

LaurenceJJones commented 4 months ago

Hi, I installedHelm chart version : 0.11.0 and the issue is still present. Can you please look into this issue ?

Hey 👋🏻

I dont think it has anything to do with TLS. By default the database is owned by root:root and the permissions are updating each time, could you try defining in the lapi environment a GID property of 2000 as that is the metabase group id from within the container.

https://github.com/crowdsecurity/helm-charts/blob/6bd1d202fa705775d928a1226bcc98c38ad6288d/charts/crowdsec/values.yaml#L145-L159

So an example

 lapi: 
   # -- replicas for local API 
   replicas: 1 
   # -- environment variables from crowdsecurity/crowdsec docker image 
   env:
     - name: GID
       value: "2000"
     # by default disable the agent because it only needs the local API. 
     #- name: DISABLE_AGENT 
     #  value: "true" 
   # Allows you to load environment variables from kubernetes secret or config map 
   envFrom: [] 
     # - secretRef: 
     #     name: env-secret 

https://github.com/crowdsecurity/crowdsec/blob/c4bfdf19914a88671663f8caae5a5ea849c1b3a6/docker/docker_start.sh#L334-L341

/ # cat /etc/group
root:x:0:root
bin:x:1:root,bin,daemon
daemon:x:2:root,bin,daemon
sys:x:3:root,bin,adm
adm:x:4:root,adm,daemon
tty:x:5:
disk:x:6:root,adm
lp:x:7:lp
mem:x:8:
kmem:x:9:
wheel:x:10:root
floppy:x:11:root
mail:x:12:mail
news:x:13:news
uucp:x:14:uucp
man:x:15:man
cron:x:16:cron
console:x:17:
audio:x:18:
cdrom:x:19:
dialout:x:20:root
ftp:x:21:
sshd:x:22:
input:x:23:
at:x:25:at
tape:x:26:root
video:x:27:root
netdev:x:28:
readproc:x:30:
squid:x:31:squid
xfs:x:33:xfs
kvm:x:34:kvm
games:x:35:
shadow:x:42:
cdrw:x:80:
www-data:x:82:
usb:x:85:
vpopmail:x:89:
users:x:100:games
ntp:x:123:
nofiles:x:200:
smmsp:x:209:smmsp
locate:x:245:
abuild:x:300:
utmp:x:406:
ping:x:999:
nogroup:x:65533:
nobody:x:65534:
metabase:x:2000:metabase
slefol commented 4 months ago

@LaurenceJJones Thank you for your interest in my request.

I defined in the lapi environment a GID property of 2000 but I get this error : ComparisonError: error calculating structured merge diff: error building typed value from config resource: .spec.template.spec.containers[name="crowdsec-lapi"].env: duplicate entries for key [name="GID"] (helm chart deployed by ArgoCD).

Indeed, we can see that variable is already defined in the template https://github.com/crowdsecurity/helm-charts/blob/main/charts/crowdsec/templates/lapi-deployment.yaml#L87-L90

LaurenceJJones commented 4 months ago

Can you share your full values.yaml please?

LaurenceJJones commented 4 months ago

Because if you using the official metabase image, it should use the same MGID

          - name: MGID
            value: "1000"
slefol commented 4 months ago

my values.yaml :

  container_runtime: containerd
  tls:
    enabled: true
    bouncer:
      reflector:
        namespaces: ["traefik"]
  agent:
    # Specify each pod whose logs we want to process
    acquisition:
      # The namespace where the pod is located
      - namespace: traefik
        # The pod name
        podName: traefik-*
        # as in crowdsec configuration, we need to specify the program name to find a matching parser
        program: traefik
    # Those are ENV variables
    env:
      - name: PARSERS
        value: "crowdsecurity/cri-logs"
      - name: COLLECTIONS
        value: "crowdsecurity/traefik"
      - name: DISABLE_PARSERS
        value: "crowdsecurity/whitelists"
    persistentVolume:
      config:
        enabled: false
  lapi:
    dashboard:
      enabled: true
      ingress:
        host: dashboard.local
        enabled: false
    persistentVolume:
      config:
        enabled: false
    env:
      # If it's a test, we don't want to share signals with CrowdSec so disable the Online API.
      - name: DISABLE_ONLINE_API
        value: "true"
      - name: GID
        value: "2000"
LaurenceJJones commented 4 months ago

my values.yaml :

  container_runtime: containerd
  tls:
    enabled: true
    bouncer:
      reflector:
        namespaces: ["traefik"]
  agent:
    # Specify each pod whose logs we want to process
    acquisition:
      # The namespace where the pod is located
      - namespace: traefik
        # The pod name
        podName: traefik-*
        # as in crowdsec configuration, we need to specify the program name to find a matching parser
        program: traefik
    # Those are ENV variables
    env:
      - name: PARSERS
        value: "crowdsecurity/cri-logs"
      - name: COLLECTIONS
        value: "crowdsecurity/traefik"
      - name: DISABLE_PARSERS
        value: "crowdsecurity/whitelists"
    persistentVolume:
      config:
        enabled: false
  lapi:
    dashboard:
      enabled: true
      ingress:
        host: dashboard.local
        enabled: false
    persistentVolume:
      config:
        enabled: false
    env:
      # If it's a test, we don't want to share signals with CrowdSec so disable the Online API.
      - name: DISABLE_ONLINE_API
        value: "true"
      - name: GID
        value: "2000"

You can remove the GID stuff didnt know we set it for both containers. Once the LAPI is started if you exec in the container you dont see these permissions?

Defaulted container "crowdsec-lapi" out of: crowdsec-lapi, dashboard, fetch-metabase-config (init)
# ls -la /var/lib/crowdsec/data 
total 104
drwxrwxrwx    3 root     root          4096 Jul 17 12:19 .
drwxr-xr-x    3 root     root          4096 Jun  5 14:15 ..
lrwxrwxrwx    1 root     root            48 Jul 17 12:18 GeoLite2-ASN.mmdb -> /staging/var/lib/crowdsec/data/GeoLite2-ASN.mmdb
lrwxrwxrwx    1 root     root            49 Jul 17 12:18 GeoLite2-City.mmdb -> /staging/var/lib/crowdsec/data/GeoLite2-City.mmdb
-rw-r-----    1 root     1000         94208 Jul 17 12:19 crowdsec.db
drwx------    2 root     root          4096 Jul 17 12:18 trace
LaurenceJJones commented 4 months ago

Okay managed to replicate that enabling TLS does infact negate the permissions from updating which is really odd as there nothing depending, it must be a race condition the database is not there whilst the chown runs

/var/lib/crowdsec/data # ls -la
total 104
drwxrwxrwx    1 root     root           102 Jul 17 13:54 .
drwxr-xr-x    1 root     root             8 Jun  5 14:15 ..
lrwxrwxrwx    1 root     root            48 Jul 17 13:50 GeoLite2-ASN.mmdb -> /staging/var/lib/crowdsec/data/GeoLite2-ASN.mmdb
lrwxrwxrwx    1 root     root            49 Jul 17 13:50 GeoLite2-City.mmdb -> /staging/var/lib/crowdsec/data/GeoLite2-City.mmdb
-rw-r-----    1 root     root         94208 Jul 17 13:54 crowdsec.db
drwx------    1 root     root             0 Jul 17 13:50 trace

Running a kubectl delete pods -n crowdsec crowdsec-lapi-bdc4d8cff-bgxd6 after init does go back and update permissions but its not good 🤷🏻

Let me see if there a way round it.

slefol commented 4 months ago

No, the permissions are :

kubectl -n crowdsec exec -it crowdsec-lapi-98f7577d6-bv4tb -- sh
Defaulted container "crowdsec-lapi" out of: crowdsec-lapi, dashboard, fetch-metabase-config (init)
/ # ls -la /var/lib/crowdsec/data
total 100
drwxr-xr-x    3 root     root            89 Jul 17 13:33 .
drwxr-xr-x    3 root     root            18 Jun  5 14:15 ..
lrwxrwxrwx    1 root     root            48 Jul 17 12:58 GeoLite2-ASN.mmdb -> /staging/var/lib/crowdsec/data/GeoLite2-ASN.mmdb
lrwxrwxrwx    1 root     root            49 Jul 17 12:58 GeoLite2-City.mmdb -> /staging/var/lib/crowdsec/data/GeoLite2-City.mmdb
-rw-r-----    1 root     root        102400 Jul 17 13:33 crowdsec.db
drwx------    2 root     root             6 Jul 17 12:58 trace

and sorry to insist but the problem does not occur when tls is not enabled.

new install with same values.yaml except tls.enabled: false :

kubectl -n crowdsec exec -it crowdsec-lapi-944958666-jrw45 -- sh
Defaulted container "crowdsec-lapi" out of: crowdsec-lapi, dashboard, fetch-metabase-config (init)
/ # ls -la /var/lib/crowdsec/data
total 70580
drwxr-xr-x    2 root     root            76 Jul 17 13:48 .
drwxr-xr-x    3 root     root            18 Apr 18 13:37 ..
-rw-------    1 root     root       8404553 Jul 17 13:47 GeoLite2-ASN.mmdb
-rw-------    1 root     root      63771586 Jul 17 13:47 GeoLite2-City.mmdb
-rw-r-----    1 root     1000         94208 Jul 17 13:48 crowdsec.db
LaurenceJJones commented 4 months ago

No, the permissions are :

kubectl -n crowdsec exec -it crowdsec-lapi-98f7577d6-bv4tb -- sh
Defaulted container "crowdsec-lapi" out of: crowdsec-lapi, dashboard, fetch-metabase-config (init)
/ # ls -la /var/lib/crowdsec/data
total 100
drwxr-xr-x    3 root     root            89 Jul 17 13:33 .
drwxr-xr-x    3 root     root            18 Jun  5 14:15 ..
lrwxrwxrwx    1 root     root            48 Jul 17 12:58 GeoLite2-ASN.mmdb -> /staging/var/lib/crowdsec/data/GeoLite2-ASN.mmdb
lrwxrwxrwx    1 root     root            49 Jul 17 12:58 GeoLite2-City.mmdb -> /staging/var/lib/crowdsec/data/GeoLite2-City.mmdb
-rw-r-----    1 root     root        102400 Jul 17 13:33 crowdsec.db
drwx------    2 root     root             6 Jul 17 12:58 trace

and sorry to insist but the problem does not occur when tls is not enabled.

new install with same values.yaml except tls.enabled: false :

kubectl -n crowdsec exec -it crowdsec-lapi-944958666-jrw45 -- sh
Defaulted container "crowdsec-lapi" out of: crowdsec-lapi, dashboard, fetch-metabase-config (init)
/ # ls -la /var/lib/crowdsec/data
total 70580
drwxr-xr-x    2 root     root            76 Jul 17 13:48 .
drwxr-xr-x    3 root     root            18 Apr 18 13:37 ..
-rw-------    1 root     root       8404553 Jul 17 13:47 GeoLite2-ASN.mmdb
-rw-------    1 root     root      63771586 Jul 17 13:47 GeoLite2-City.mmdb
-rw-r-----    1 root     1000         94208 Jul 17 13:48 crowdsec.db

Okay we tracked down the issue, so its not a race condition. When using TLS we dont need to add the machines to the database since when they authenticate with mTLS it will automatically create the database entry. Since we dont interact with the database, there is no database whilst the chown command runs. (Deleting the pod works as that files still exists and it works). So for now we will update crowdsec to run cscli machines list and redirect the output to /dev/null this is a hack but it would fix this issue as there no real way to fix it as its true there should be no DB.

So your were right about the TLS stuff, it was just a mess to find where the exact problem was. We will merge and fix this for 1.6.3 as its a very minor change.

LaurenceJJones commented 4 months ago

https://github.com/crowdsecurity/crowdsec/pull/3140