Acquis.yaml can't have multiple file datasources point to the same file with different label "type"

samthesamman commented 3 weeks ago

What happened?

I have nginx logs sent to journald, which then writes these to /var/log/syslog. In my acquis.yaml I have 2 datasources, each pointing to /var/log/syslog. However, the type of one datasource is 'syslog' (which is used by the ssh parsers, and one type which is 'nginx' which is used by the nginx parser).

`filenames:

/var/log/auth.log
/var/log/syslog
/var/log/kern.log labels: type: syslog

filenames:
/var/log/syslog labels: type: nginx`

When crowdsec starts, it all works as expected. But once the syslog file is rotated, my nginx parser is stops reading any new lines in. This is likely due to how f.tails[] key is the filename instead of some other datasource identifier.

What did you expect to happen?

I should be able to have multiple datasources pointing to one file, or alternatively be able to set multiple types for one datasource.

How can we reproduce it (as minimally and precisely as possible)?

Copy my acquis.yaml as above and let it run until log rotate.

Anything else we need to know?

No response

Crowdsec version

version: v1.6.3-4851945a Codename: alphaga BuildDate: 2024-09-12_09:39:08 GoVersion: 1.22.6 Platform: docker libre2: C++ User-Agent: crowdsec/v1.6.3-4851945a-docker Constraint_parser: >= 1.0, <= 3.0 Constraint_scenario: >= 1.0, <= 3.0 Constraint_api: v1 Constraint_acquis: >= 1.0, < 2.0

OS version

Docker

Enabled collections and parsers

name,status,version,description,type crowdsecurity/cri-logs,enabled,0.1,CRI logging format parser,parsers crowdsecurity/dateparse-enrich,enabled,0.2,,parsers crowdsecurity/docker-logs,enabled,0.1,docker json logs parser,parsers crowdsecurity/geoip-enrich,enabled,0.5,"Populate event with geoloc info : as, country, coords, source range.",parsers crowdsecurity/http-logs,enabled,1.2,"Parse more Specifically HTTP logs, such as HTTP Code, HTTP path, HTTP args and if its a static ressource",parsers crowdsecurity/iptables-logs,enabled,0.5,Parse iptables drop logs,parsers crowdsecurity/nginx-logs,enabled,1.6,Parse nginx access and error logs,parsers crowdsecurity/postfix-logs,enabled,0.7,Parse postfix logs,parsers crowdsecurity/postscreen-logs,enabled,0.3,Parse postscreen logs,parsers crowdsecurity/sshd-logs,enabled,2.8,Parse openSSH logs,parsers crowdsecurity/syslog-logs,enabled,0.8,,parsers crowdsecurity/whitelists,enabled,0.2,Whitelist events from private ipv4 addresses,parsers crowdsecurity/cdn-whitelist,enabled,0.4,Whitelist CDN providers,postoverflows crowdsecurity/rdns,enabled,0.3,Lookup the DNS associated to the source IP only for overflows,postoverflows crowdsecurity/seo-bots-whitelist,enabled,0.5,Whitelist good search engine crawlers,postoverflows crowdsecurity/apache_log4j2_cve-2021-44228,enabled,0.6,Detect cve-2021-44228 exploitation attemps,scenarios crowdsecurity/CVE-2017-9841,enabled,0.2,Detect CVE-2017-9841 exploits,scenarios crowdsecurity/CVE-2019-18935,enabled,0.2,Detect Telerik CVE-2019-18935 exploitation attempts,scenarios crowdsecurity/CVE-2022-26134,enabled,0.2,Detect CVE-2022-26134 exploits,scenarios crowdsecurity/CVE-2022-35914,enabled,0.2,Detect CVE-2022-35914 exploits,scenarios crowdsecurity/CVE-2022-37042,enabled,0.2,Detect CVE-2022-37042 exploits,scenarios crowdsecurity/CVE-2022-40684,enabled,0.3,Detect cve-2022-40684 exploitation attempts,scenarios crowdsecurity/CVE-2022-41082,enabled,0.4,Detect CVE-2022-41082 exploits,scenarios crowdsecurity/CVE-2022-41697,enabled,0.2,Detect CVE-2022-41697 enumeration,scenarios crowdsecurity/CVE-2022-42889,enabled,0.3,Detect CVE-2022-42889 exploits (Text4Shell),scenarios crowdsecurity/CVE-2022-44877,enabled,0.3,Detect CVE-2022-44877 exploits,scenarios crowdsecurity/CVE-2022-46169,enabled,0.2,Detect CVE-2022-46169 brute forcing,scenarios crowdsecurity/CVE-2023-22515,enabled,0.1,Detect CVE-2023-22515 exploitation,scenarios crowdsecurity/CVE-2023-22518,enabled,0.2,Detect CVE-2023-22518 exploits,scenarios crowdsecurity/CVE-2023-49103,enabled,0.3,Detect owncloud CVE-2023-49103 exploitation attempts,scenarios crowdsecurity/CVE-2024-38475,enabled,0.1,Detect CVE-2024-38475 exploitation attempts,scenarios crowdsecurity/f5-big-ip-cve-2020-5902,enabled,0.2,Detect cve-2020-5902 exploitation attemps,scenarios crowdsecurity/fortinet-cve-2018-13379,enabled,0.3,Detect cve-2018-13379 exploitation attemps,scenarios crowdsecurity/grafana-cve-2021-43798,enabled,0.2,Detect cve-2021-43798 exploitation attemps,scenarios crowdsecurity/http-admin-interface-probing,enabled,0.4,Detect generic HTTP admin interface probing,scenarios crowdsecurity/http-backdoors-attempts,enabled,0.6,Detect attempt to common backdoors,scenarios crowdsecurity/http-bad-user-agent,enabled,1.2,Detect usage of bad User Agent,scenarios crowdsecurity/http-crawl-non_statics,enabled,0.7,Detect aggressive crawl on non static resources,scenarios crowdsecurity/http-cve-2021-41773,enabled,0.2,cve-2021-41773,scenarios crowdsecurity/http-cve-2021-42013,enabled,0.2,cve-2021-42013,scenarios crowdsecurity/http-cve-probing,enabled,0.2,Detect generic HTTP cve probing,scenarios crowdsecurity/http-generic-bf,enabled,0.6,Detect generic http brute force,scenarios crowdsecurity/http-open-proxy,enabled,0.5,Detect scan for open proxy,scenarios crowdsecurity/http-path-traversal-probing,enabled,0.4,Detect path traversal attempt,scenarios crowdsecurity/http-probing,enabled,0.4,Detect site scanning/probing from a single ip,scenarios crowdsecurity/http-sensitive-files,enabled,0.4,"Detect attempt to access to sensitive files (.log, .db ..) or folders (.git)",scenarios crowdsecurity/http-sqli-probing,enabled,0.4,A scenario that detects SQL injection probing with minimal false positives,scenarios crowdsecurity/http-wordpress-scan,enabled,0.2,Detect WordPress scan: vuln hunting,scenarios crowdsecurity/http-xss-probing,enabled,0.4,A scenario that detects XSS probing with minimal false positives,scenarios crowdsecurity/iptables-scan-multi_ports,enabled,0.2,ban IPs that are scanning us,scenarios crowdsecurity/jira_cve-2021-26086,enabled,0.3,Detect Atlassian Jira CVE-2021-26086 exploitation attemps,scenarios crowdsecurity/netgear_rce,enabled,0.3,Detect Netgear RCE DGN1000/DGN220 exploitation attempts,scenarios crowdsecurity/nginx-req-limit-exceeded,enabled,0.3,Detects IPs which violate nginx's user set request limit.,scenarios crowdsecurity/postfix-helo-rejected,enabled,0.1,Detect HELO rejections,scenarios crowdsecurity/postfix-relay-denied,enabled,0.1,Detect multiple open relay attempts,scenarios crowdsecurity/postfix-spam,enabled,0.4,Detect spammers,scenarios crowdsecurity/pulse-secure-sslvpn-cve-2019-11510,enabled,0.3,Detect cve-2019-11510 exploitation attemps,scenarios crowdsecurity/spring4shell_cve-2022-22965,enabled,0.3,Detect cve-2022-22965 probing,scenarios crowdsecurity/ssh-bf,enabled,0.3,Detect ssh bruteforce,scenarios crowdsecurity/ssh-cve-2024-6387,enabled,0.2,Detect exploitation attempt of CVE-2024-6387,scenarios crowdsecurity/ssh-slow-bf,enabled,0.4,Detect slow ssh bruteforce,scenarios crowdsecurity/thinkphp-cve-2018-20062,enabled,0.6,Detect ThinkPHP CVE-2018-20062 exploitation attemps,scenarios crowdsecurity/vmware-cve-2022-22954,enabled,0.3,Detect Vmware CVE-2022-22954 exploitation attempts,scenarios crowdsecurity/vmware-vcenter-vmsa-2021-0027,enabled,0.2,Detect VMSA-2021-0027 exploitation attemps,scenarios ltsich/http-w00tw00t,enabled,0.2,detect w00tw00t,scenarios crowdsecurity/bf_base,enabled,0.1,,contexts crowdsecurity/firewall_base,enabled,0.2,,contexts crowdsecurity/http_base,enabled,0.2,,contexts crowdsecurity/base-http-scenarios,enabled,1.0,http common : scanners detection,collections crowdsecurity/http-cve,enabled,2.7,Detect CVE exploitation in http logs,collections crowdsecurity/iptables,enabled,0.2,iptables support : logs and port-scans detection scenarios,collections crowdsecurity/linux,enabled,0.2,core linux support : syslog+geoip+ssh,collections crowdsecurity/nginx,enabled,0.2,nginx support : parser and generic http scenarios,collections crowdsecurity/postfix,enabled,0.3,postfix support : parser and spammer detection,collections crowdsecurity/sshd,enabled,0.5,sshd support : parser and brute-force detection,collections crowdsecurity/whitelist-good-actors,enabled,0.1,Good actors whitelists,collections

Acquisition config

filenames:

/var/log/auth.log
/var/log/syslog
/var/log/kern.log labels: type: syslog

filenames:
/var/log/syslog labels: type: nginx

Config show

Global:

Configuration Folder : /etc/crowdsec
Data Folder : /var/lib/crowdsec/data
Hub Folder : /etc/crowdsec/hub
Simulation File : /etc/crowdsec/simulation.yaml
Log Folder : /var/log
Log level : info
Log Media : stdout Crowdsec:
- Acquisition File : /etc/crowdsec/acquis.yaml
- Parsers routines : 1
- Acquisition Folder : /etc/crowdsec/acquis.d cscli:
- Output : human
- Hub Branch : API Client:
- URL : http://0.0.0.0:8080/
- Login : my-host
- Credentials File : /etc/crowdsec/local_api_credentials.yaml Local API Server:
- Listen URL : 0.0.0.0:8080
- Listen Socket :
- Profile File : /etc/crowdsec/profiles.yaml
- Trusted IPs:
  - 127.0.0.1
  - ::1
- Database:
  - Type : sqlite
  - Path : /var/lib/crowdsec/data/crowdsec.db
  - Flush age : 7d
  - Flush size : 5000

Prometheus metrics

```console $ cscli metrics # paste output here ```

Related custom configs versions (if applicable) : notification plugins, custom scenarios, parsers etc.

github-actions[bot] commented 3 weeks ago

@samthesamman: Thanks for opening an issue, it is currently awaiting triage.

In the meantime, you can:

Check Crowdsec Documentation to see if your issue can be self resolved.
You can also join our Discord.
Check Releases to make sure your agent is on the latest version.

Details

I am a bot created to help the [crowdsecurity](https://github.com/crowdsecurity) developers manage community feedback and contributions. You can check out my [manifest file](https://github.com/crowdsecurity/crowdsec/blob/master/.github/governance.yml) to understand my behavior and what I can do. If you want to use this for your project, you can check out the [BirthdayResearch/oss-governance-bot](https://github.com/BirthdayResearch/oss-governance-bot) repository.

LaurenceJJones commented 3 weeks ago

If nginx is going to the same syslog file, does it print the program as nginx just before the pid identifier?

samthesamman commented 3 weeks ago

Thank you! syslog entries did not have "nginx" tagged. However, I realized this is because I am running nginx inside of Docker and Docker is using the journald log-driver where it is tagging log entries with only the container ID. Thanks to your comment, I've updated the log driver to add the tag "nginx" and now Crowdsec is working with just a single syslog datasource inside of the acquis.yaml file.

I thought the only way to set the 'program' key was via the label "type" in acquis.yaml, but after re-reading how the s00 parser works it looks like it is able to extract this from the syslog tag, so thank you!

Might be good to have documentation that each datasource needs a unique log file (can't share log file with multiple datasources).

LaurenceJJones commented 3 weeks ago

Exactly, this will be a little performant too, since if you have 2 data sources reading the same file they will both read the file twice which is unnecessary if the program is tagged correctly.

I have reviewed the code and the data sources as long as they are separate (which they are cause of the --- between the yaml) they should spin up separate contexts which shouldn't overlap. I will put this on the backlog to investigate further once we have a chance maybe post the next version, as you have currently workaround the issue by tagging it 👍🏻

crowdsecurity / crowdsec