When k3s rotates Traefik log, Crowdsec is unable to continue to watch log

clintkev251 commented 1 year ago

What happened?

I've recently migrated to the official Crowdsec helm chart for my deployment of Crowdsec in k3s, and everything was working great, for the first 12 hours or so. Then I noticed one by one each of my agent pods stopped recording any acquisitions. On restarting the pods, they began to work again, however after some time, they failed again. Digging deeper, this is occurring when k3s rotates the container log. When this occurs, the agent pod emits the following logs:

time="01-09-2023 08:55:34" level=info msg="Re-opening moved/deleted file /var/log/containers/traefik-78cdb8f86-p9xfp_traefik_traefik-c4f03f0102f7226db163fb623df1a9288a8ba561908f1aaa388ca7f38134021a.log ..."
time="01-09-2023 08:55:34" level=info msg="Waiting for /var/log/containers/traefik-78cdb8f86-p9xfp_traefik_traefik-c4f03f0102f7226db163fb623df1a9288a8ba561908f1aaa388ca7f38134021a.log to appear..."

I can still exec into the pod and manually tail that log and see all the new lines coming in, but Crowdsec is never able to pick back up. The logs contained in /var/log/containers are symlinks to the actual log files which are in /var/log/pods/// so it's possible this is part of the issue.

What did you expect to happen?

Crowdsec should be able to reopen the log file after the logrotation has completed.

How can we reproduce it (as minimally and precisely as possible)?

Use the official helm chart and add a pod to the acquisition config, wait until the log reaches the maximum size configured for your cluster (10 MB by default) and for k3s to rotate it. After the log has been rotated, observe if the Crowdsec pod on that node is still picking up acquisitions.

Anything else we need to know?

No response

Crowdsec version

```console $ cscli version crowdsec-agent-5z5fp:/# cscli version 2023/09/01 14:14:43 version: v1.5.2-4fbc3402fba932c8bd34b671527dcf7909d264c0 2023/09/01 14:14:43 Codename: alphaga 2023/09/01 14:14:43 BuildDate: 2023-05-26_16:18:45 2023/09/01 14:14:43 GoVersion: 1.20.4 2023/09/01 14:14:43 Platform: docker 2023/09/01 14:14:43 Constraint_parser: >= 1.0, <= 2.0 2023/09/01 14:14:43 Constraint_scenario: >= 1.0, < 3.0 2023/09/01 14:14:43 Constraint_api: v1 2023/09/01 14:14:43 Constraint_acquis: >= 1.0, < 2.0 ```

OS version

```console # On Linux: $ cat /etc/os-release PRETTY_NAME="Ubuntu 22.04.3 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.3 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy $ uname -a Linux k3s-03 5.15.0-79-generic #86-Ubuntu SMP Mon Jul 10 16:07:21 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux # On Windows: C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture # paste output here ```

Enabled collections and parsers

```console $ cscli hub list -o raw LePresidente/authelia,enabled,0.2,Authelia Support : parser and brute-force detection,collections LePresidente/grafana,enabled,0.1,Grafana Support : parser and brute-force detection,collections crowdsecurity/base-http-scenarios,enabled,0.6,http common : scanners detection,collections crowdsecurity/home-assistant,enabled,0.1,Home assistant support : logs and brute-force scenario,collections crowdsecurity/http-cve,enabled,2.1,,collections crowdsecurity/linux,enabled,0.2,core linux support : syslog+geoip+ssh,collections crowdsecurity/sshd,"enabled,update-available",0.2,sshd support : parser and brute-force detection,collections crowdsecurity/traefik,enabled,0.1,traefik support: parser and generic http scenarios,collections LePresidente/authelia-logs,enabled,0.3,Parse Authelia logs,parsers LePresidente/grafana-logs,enabled,0.1,Parse grafana logs,parsers crowdsecurity/cri-logs,enabled,0.1,CRI logging format parser,parsers crowdsecurity/dateparse-enrich,enabled,0.2,,parsers crowdsecurity/docker-logs,enabled,0.1,docker json logs parser,parsers crowdsecurity/geoip-enrich,enabled,0.2,"Populate event with geoloc info : as, country, coords, source range.",parsers crowdsecurity/home-assistant-logs,enabled,0.5,Parse Home Assistant logs,parsers crowdsecurity/http-logs,enabled,1.1,"Parse more Specifically HTTP logs, such as HTTP Code, HTTP path, HTTP args and if its a static ressource",parsers crowdsecurity/sshd-logs,"enabled,update-available",2.0,Parse openSSH logs,parsers crowdsecurity/syslog-logs,enabled,0.8,,parsers crowdsecurity/traefik-logs,enabled,0.8,Parse Traefik access logs,parsers crowdsecurity/whitelists,enabled,0.2,Whitelist events from private ipv4 addresses,parsers LePresidente/authelia-bf,enabled,0.2,Detect authelia bruteforce,scenarios LePresidente/grafana-bf,enabled,0.1,Detect grafana bruteforce,scenarios crowdsecurity/CVE-2019-18935,enabled,0.1,Detect Telerik CVE-2019-18935 exploitation attempts,scenarios crowdsecurity/CVE-2022-26134,enabled,0.1,Detect CVE-2022-26134 exploits,scenarios crowdsecurity/CVE-2022-35914,enabled,0.1,Detect CVE-2022-35914 exploits,scenarios crowdsecurity/CVE-2022-37042,enabled,0.1,Detect CVE-2022-37042 exploits,scenarios crowdsecurity/CVE-2022-40684,enabled,0.2,Detect cve-2022-40684 exploitation attempts,scenarios crowdsecurity/CVE-2022-41082,enabled,0.3,Detect CVE-2022-41082 exploits,scenarios crowdsecurity/CVE-2022-41697,enabled,0.1,Detect CVE-2022-41697 enumeration,scenarios crowdsecurity/CVE-2022-42889,enabled,0.2,Detect CVE-2022-42889 exploits (Text4Shell),scenarios crowdsecurity/CVE-2022-44877,enabled,0.2,Detect CVE-2022-44877 exploits,scenarios crowdsecurity/CVE-2022-46169,enabled,0.1,Detect CVE-2022-46169 brute forcing,scenarios crowdsecurity/apache_log4j2_cve-2021-44228,enabled,0.4,Detect cve-2021-44228 exploitation attemps,scenarios crowdsecurity/f5-big-ip-cve-2020-5902,enabled,0.1,Detect cve-2020-5902 exploitation attemps,scenarios crowdsecurity/fortinet-cve-2018-13379,enabled,0.2,Detect cve-2018-13379 exploitation attemps,scenarios crowdsecurity/grafana-cve-2021-43798,enabled,0.1,Detect cve-2021-43798 exploitation attemps,scenarios crowdsecurity/home-assistant-bf,enabled,0.2,Detect Home Assistant bruteforce,scenarios crowdsecurity/http-backdoors-attempts,enabled,0.3,Detect attempt to common backdoors,scenarios crowdsecurity/http-bad-user-agent,enabled,0.8,Detect bad user-agents,scenarios crowdsecurity/http-crawl-non_statics,enabled,0.3,Detect aggressive crawl from single ip,scenarios crowdsecurity/http-cve-2021-41773,enabled,0.1,cve-2021-41773,scenarios crowdsecurity/http-cve-2021-42013,enabled,0.1,cve-2021-42013,scenarios crowdsecurity/http-generic-bf,enabled,0.4,Detect generic http brute force,scenarios crowdsecurity/http-open-proxy,enabled,0.3,Detect scan for open proxy,scenarios crowdsecurity/http-path-traversal-probing,enabled,0.2,Detect path traversal attempt,scenarios crowdsecurity/http-probing,enabled,0.2,Detect site scanning/probing from a single ip,scenarios crowdsecurity/http-sensitive-files,enabled,0.2,"Detect attempt to access to sensitive files (.log, .db ..) or folders (.git)",scenarios crowdsecurity/http-sqli-probing,enabled,0.2,A scenario that detects SQL injection probing with minimal false positives,scenarios crowdsecurity/http-xss-probing,enabled,0.2,A scenario that detects XSS probing with minimal false positives,scenarios crowdsecurity/jira_cve-2021-26086,enabled,0.1,Detect Atlassian Jira CVE-2021-26086 exploitation attemps,scenarios crowdsecurity/netgear_rce,enabled,0.2,Detect Netgear RCE DGN1000/DGN220 exploitation attempts,scenarios crowdsecurity/pulse-secure-sslvpn-cve-2019-11510,enabled,0.2,Detect cve-2019-11510 exploitation attemps,scenarios crowdsecurity/spring4shell_cve-2022-22965,enabled,0.2,Detect cve-2022-22965 probing,scenarios crowdsecurity/ssh-bf,enabled,0.1,Detect ssh bruteforce,scenarios crowdsecurity/ssh-slow-bf,enabled,0.2,Detect slow ssh bruteforce,scenarios crowdsecurity/thinkphp-cve-2018-20062,enabled,0.3,Detect ThinkPHP CVE-2018-20062 exploitation attemps,scenarios crowdsecurity/vmware-cve-2022-22954,enabled,0.2,Detect Vmware CVE-2022-22954 exploitation attempts,scenarios crowdsecurity/vmware-vcenter-vmsa-2021-0027,enabled,0.1,Detect VMSA-2021-0027 exploitation attemps,scenarios ltsich/http-w00tw00t,enabled,0.1,detect w00tw00t,scenarios ```

Acquisition config

```console # On Linux: $ cat /etc/crowdsec/acquis.yaml /etc/crowdsec/acquis.d/* --- filenames: - /var/log/containers/traefik-*_traefik_*.log force_inotify: true labels: type: containerd program: traefik --- filenames: - /var/log/containers/authelia-server-*_traefik_*.log force_inotify: true labels: type: containerd program: authelia --- filenames: - /var/log/containers/homeassistant-*_homeassistant_*.log force_inotify: true labels: type: containerd program: home-assistant --- filenames: - /var/log/containers/grafana-*_metrics_*.log force_inotify: true labels: type: containerd program: grafana # On Windows: C:\> Get-Content C:\ProgramData\CrowdSec\config\acquis.yaml # paste output here

Config show

```console $ cscli config show Global: - Configuration Folder : /etc/crowdsec - Configuration Folder : /etc/crowdsec - Data Folder : /var/lib/crowdsec/data - Hub Folder : /etc/crowdsec/hub - Simulation File : /etc/crowdsec/simulation.yaml - Log Folder : /var/log/ - Log level : info - Log Media : stdout Crowdsec: - Acquisition File : /etc/crowdsec/acquis.yaml - Parsers routines : 1 - Acquisition Folder : /etc/crowdsec/acquis.d cscli: - Output : human - Hub Branch : - Hub Folder : /etc/crowdsec/hub API Client: - URL : http://crowdsec-service.traefik:8080/ - Login : xxxx - Credentials File : /etc/crowdsec/local_api_credentials.yaml Local API Server: - Listen URL : 0.0.0.0:8080 - Profile File : /etc/crowdsec/profiles.yaml - Trusted IPs: - 127.0.0.1 - ::1 - Database: - Type : sqlite - Path : /var/lib/crowdsec/data/crowdsec.db - Flush age : 7d - Flush size : 5000 ```

Prometheus metrics

```console $ cscli metrics # paste output here ```

Related custom configs versions (if applicable) : notification plugins, custom scenarios, parsers etc.

Helm values: ``` container_runtime: containerd lapi: resources: limits: memory: 200Mi metrics: enabled: "true" serviceMonitor: enabled: "true" agent: metrics: enabled: "true" serviceMonitor: enabled: "true" resources: limits: memory: 200Mi acquisition: - namespace: traefik podName: traefik-* program: traefik - namespace: traefik podName: authelia-server-* program: authelia - namespace: homeassistant podName: homeassistant-* program: home-assistant - namespace: metrics podName: grafana-* program: grafana env: - name: COLLECTIONS value: crowdsecurity/traefik crowdsecurity/http-cve LePresidente/authelia crowdsecurity/home-assistant LePresidente/grafana config: notifications: slack.yaml: | type: slack name: slack_default log_level: info format: | {{range . -}} {{$alert := . -}} {{range .Decisions -}} {{if $alert.Source.Cn -}} :flag-{{$alert.Source.Cn}}: will get {{.Type}} for next {{.Duration}} for triggering {{.Scenario}} on machine '{{$alert.MachineID}}'. {{end}} {{if not $alert.Source.Cn -}} :pirate_flag: will get {{.Type}} for next {{.Duration}} for triggering {{.Scenario}} on machine '{{$alert.MachineID}}'. {{end}} {{end -}} {{end -}} webhook: xxx profiles.yaml: | name: default_ip_remediation #debug: true filters: - Alert.Remediation == true && Alert.GetScope() == "Ip" decisions: - type: ban duration: 4h #duration_expr: Sprintf('%dh', (GetDecisionsCount(Alert.GetValue()) + 1) * 4) notifications: - slack_default # Set the webhook in /etc/crowdsec/notifications/slack.yaml before enabling this. # - splunk_default # Set the splunk url and token in /etc/crowdsec/notifications/splunk.yaml before enabling this. # - http_default # Set the required http parameters in /etc/crowdsec/notifications/http.yaml before enabling this. # - email_default # Set the required email parameters in /etc/crowdsec/notifications/email.yaml before enabling this. on_success: break ```

github-actions[bot] commented 1 year ago

@clintkev251: Thanks for opening an issue, it is currently awaiting triage.

In the meantime, you can:

Check Crowdsec Documentation to see if your issue can be self resolved.
You can also join our Discord.
Check Releases to make sure your agent is on the latest version.

Details

I am a bot created to help the [crowdsecurity](https://github.com/crowdsecurity) developers manage community feedback and contributions. You can check out my [manifest file](https://github.com/crowdsecurity/crowdsec/blob/master/.github/governance.yml) to understand my behavior and what I can do. If you want to use this for your project, you can check out the [BirthdayResearch/oss-governance-bot](https://github.com/BirthdayResearch/oss-governance-bot) repository.

corybolar commented 1 year ago

@clintkev251 Did you ever find a solution to this problem?

clintkev251 commented 1 year ago

I found a suitable workaround at least. I closed this issue mostly because my further research led me to believe that this issue was more related to how the helm chart is set up rather than how Crowdsec actually watches logs. My workaround for the moment which has been rock solid was to add poll_without_inotify: true to each acquisition file source. It is noted that this can increase CPU usage, however I didn't notice much of an impact after some study so I'm happy enough with it. This appears to be a newer option which is not currently supported by the helm chart, so I've opened a pull request over there to add this to the schema for additionalAcquisition file types https://github.com/crowdsecurity/helm-charts/pull/109 and ideally it can also be added as an option to the automatically configured acquisitions.

corybolar commented 1 year ago

@clintkev251 Thanks for the response! Looking forward to your PR being merged.

crowdsecurity / crowdsec