crowdsecurity / crowdsec

CrowdSec - the open-source and participative security solution offering crowdsourced protection against malicious IPs and access to the most advanced real-world CTI.
https://crowdsec.net
MIT License
9.03k stars 467 forks source link

cs_active_decisions metric only exported when there are decisions in the list #2563

Closed WoutResseler closed 11 months ago

WoutResseler commented 1 year ago

What happened?

When there is an active decision in the list Example

╭──────────┬──────────┬───────────────┬───────────────────────────┬────────┬─────────┬────┬────────┬──────────────────┬──────────╮
│    ID    │  Source  │  Scope:Value  │          Reason           │ Action │ Country │ AS │ Events │    expiration    │ Alert ID │
├──────────┼──────────┼───────────────┼───────────────────────────┼────────┼─────────┼────┼────────┼──────────────────┼──────────┤
│ 11972985 │ crowdsec │ Ip:10.4.0.162 │ crowdsecurity/ssh-slow-bf │ ban    │         │    │ 11     │ 3h59m42.6917358s │ 2084     │
╰──────────┴──────────┴───────────────┴───────────────────────────┴────────┴─────────┴────┴────────┴──────────────────┴──────────╯

When looking at the metrics endpoint I see

cs_active_decisions{action="ban",origin="crowdsec",reason="crowdsecurity/ssh-slow-bf"} 1

Which is correct

When there is no active decision

# cscli decisions list
No active decisions

When looking at the metrics endpoint for the cs_active_decisions{action="ban",origin="crowdsec",reason="crowdsecurity/ssh-slow-bf"} it is not there.

What did you expect to happen?

I would expect to see

cs_active_decisions{action="ban",origin="crowdsec",reason="crowdsecurity/ssh-slow-bf"} 0

on the metrics endpoint

How can we reproduce it (as minimally and precisely as possible)?

Have something in the decision list, look at the metrics and point and find the corresponding active decision metric

Clear the decision list for that type of decision and look again at the metrics endpoint

Anything else we need to know?

Not sure if this is by design, but it causes some weird graphs when consuming the metrics because there is suddenly no more data

For example in this grafana graph:

2023-10-24T17:32:24,636063038+02:00

Crowdsec version

```console $ cscli version 2023/10/24 17:33:27 version: v1.5.4-rpm-pragmatic-amd64-e4dcdd25728b914823525f1efabf18d5c454902b 2023/10/24 17:33:27 Codename: alphaga 2023/10/24 17:33:27 BuildDate: 2023-09-20_12:17:47 2023/10/24 17:33:27 GoVersion: 1.20.5 2023/10/24 17:33:27 Platform: linux 2023/10/24 17:33:27 libre2: C++ 2023/10/24 17:33:27 Constraint_parser: >= 1.0, <= 2.0 2023/10/24 17:33:27 Constraint_scenario: >= 1.0, < 3.0 2023/10/24 17:33:27 Constraint_api: v1 2023/10/24 17:33:27 Constraint_acquis: >= 1.0, < 2.0 ```

OS version

```console # On Linux: $ cat /etc/os-release NAME="AlmaLinux" VERSION="8.8 (Sapphire Caracal)" ID="almalinux" ID_LIKE="rhel centos fedora" VERSION_ID="8.8" PLATFORM_ID="platform:el8" PRETTY_NAME="AlmaLinux 8.8 (Sapphire Caracal)" ANSI_COLOR="0;34" LOGO="fedora-logo-icon" CPE_NAME="cpe:/o:almalinux:almalinux:8::baseos" HOME_URL="https://almalinux.org/" DOCUMENTATION_URL="https://wiki.almalinux.org/" BUG_REPORT_URL="https://bugs.almalinux.org/" ALMALINUX_MANTISBT_PROJECT="AlmaLinux-8" ALMALINUX_MANTISBT_PROJECT_VERSION="8.8" REDHAT_SUPPORT_PRODUCT="AlmaLinux" REDHAT_SUPPORT_PRODUCT_VERSION="8.8" $ uname -a Linux hostname 4.18.0-477.27.2.el8_8.x86_64 #1 SMP Fri Sep 29 08:21:01 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux # On Windows: C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture # paste output here ```

Enabled collections and parsers

```console $ cscli hub list -o raw crowdsecurity/linux,enabled,0.2,core linux support : syslog+geoip+ssh,collections crowdsecurity/sshd,enabled,0.2,sshd support : parser and brute-force detection,collections crowdsecurity/dateparse-enrich,enabled,0.2,,parsers crowdsecurity/geoip-enrich,enabled,0.2,"Populate event with geoloc info : as, country, coords, source range.",parsers crowdsecurity/sshd-logs,enabled,2.0,Parse openSSH logs,parsers crowdsecurity/syslog-logs,enabled,0.8,,parsers crowdsecurity/whitelists,"enabled,tainted",?,Whitelist events from private ipv4 addresses,parsers crowdsecurity/ssh-bf,enabled,0.1,Detect ssh bruteforce,scenarios crowdsecurity/ssh-slow-bf,enabled,0.2,Detect slow ssh bruteforce,scenarios ```

Acquisition config

```console # On Linux: $ cat /etc/crowdsec/acquis.yaml /etc/crowdsec/acquis.d/* #Generated acquisition file - wizard.sh (service: sshd) / files : /var/log/secure filenames: - /var/log/secure labels: type: syslog --- #Generated acquisition file - wizard.sh (service: linux) / files : /var/log/messages filenames: - /var/log/messages labels: type: syslog --- cat: '/etc/crowdsec/acquis.d/*': No such file or directory # On Windows: C:\> Get-Content C:\ProgramData\CrowdSec\config\acquis.yaml # paste output here

Config show

```console $ cscli config show Global: - Configuration Folder : /etc/crowdsec - Data Folder : /var/lib/crowdsec/data - Hub Folder : /etc/crowdsec/hub - Simulation File : /etc/crowdsec/simulation.yaml - Log Folder : /var/log/ - Log level : info - Log Media : file Crowdsec: - Acquisition File : /etc/crowdsec/acquis.yaml - Parsers routines : 1 - Acquisition Folder : /etc/crowdsec/acquis.d cscli: - Output : human - Hub Branch : - Hub Folder : /etc/crowdsec/hub API Client: - URL : - Login : - Credentials File : /etc/crowdsec/local_api_credentials.yaml Local API Server: - Listen URL : 127.0.0.1:8080 - Profile File : /etc/crowdsec/profiles.yaml - Trusted IPs: - 127.0.0.1 - ::1 - Database: - Type : sqlite - Path : /var/lib/crowdsec/data/crowdsec.db - Flush age : 7d - Flush size : 5000 ```

Prometheus metrics

```console $ cscli metrics Acquisition Metrics: ╭────────────────────────┬────────────┬──────────────┬────────────────┬────────────────────────╮ │ Source │ Lines read │ Lines parsed │ Lines unparsed │ Lines poured to bucket │ ├────────────────────────┼────────────┼──────────────┼────────────────┼────────────────────────┤ │ file:/var/log/messages │ 171 │ - │ 171 │ - │ │ file:/var/log/secure │ 1.48k │ 788 │ 695 │ 1.58k │ ╰────────────────────────┴────────────┴──────────────┴────────────────┴────────────────────────╯ Bucket Metrics: ╭─────────────────────────────────────┬───────────────┬───────────┬──────────────┬────────┬─────────╮ │ Bucket │ Current Count │ Overflows │ Instantiated │ Poured │ Expired │ ├─────────────────────────────────────┼───────────────┼───────────┼──────────────┼────────┼─────────┤ │ crowdsecurity/ssh-bf │ - │ 127 │ 147 │ 788 │ 20 │ │ crowdsecurity/ssh-bf_user-enum │ - │ - │ 5 │ 5 │ 5 │ │ crowdsecurity/ssh-slow-bf │ - │ 68 │ 81 │ 788 │ 13 │ │ crowdsecurity/ssh-slow-bf_user-enum │ - │ - │ 3 │ 3 │ 3 │ ╰─────────────────────────────────────┴───────────────┴───────────┴──────────────┴────────┴─────────╯ Parser Metrics: ╭─────────────────────────────────┬───────┬────────┬──────────╮ │ Parsers │ Hits │ Parsed │ Unparsed │ ├─────────────────────────────────┼───────┼────────┼──────────┤ │ child-crowdsecurity/sshd-logs │ 8.57k │ 788 │ 7.78k │ │ child-crowdsecurity/syslog-logs │ 1.65k │ 1.65k │ - │ │ crowdsecurity/dateparse-enrich │ 788 │ 788 │ - │ │ crowdsecurity/geoip-enrich │ 788 │ 788 │ - │ │ crowdsecurity/sshd-logs │ 1.21k │ 788 │ 421 │ │ crowdsecurity/syslog-logs │ 1.65k │ 1.65k │ - │ │ crowdsecurity/whitelists │ 788 │ 788 │ - │ ╰─────────────────────────────────┴───────┴────────┴──────────╯ ```

Related custom configs versions (if applicable) : notification plugins, custom scenarios, parsers etc.

github-actions[bot] commented 1 year ago

@WoutResseler: Thanks for opening an issue, it is currently awaiting triage.

In the meantime, you can:

  1. Check Crowdsec Documentation to see if your issue can be self resolved.
  2. You can also join our Discord.
  3. Check Releases to make sure your agent is on the latest version.
Details I am a bot created to help the [crowdsecurity](https://github.com/crowdsecurity) developers manage community feedback and contributions. You can check out my [manifest file](https://github.com/crowdsecurity/crowdsec/blob/master/.github/governance.yml) to understand my behavior and what I can do. If you want to use this for your project, you can check out the [BirthdayResearch/oss-governance-bot](https://github.com/BirthdayResearch/oss-governance-bot) repository.
blotus commented 11 months ago

Hello,

This is half by design / half side effect :)

Internally, crowdsec has no "memory" of its metrics from one start to another, or even from one fetch to another. In the case of the decisions metrics, everything is recomputed dynamically when prometheus fetches the metrics, it's just a view of what is currently in the database. This means that if you delete the decisions, crowdsec will have no knowledge that there was ever a decision for a specific scenario, and the time series disappear.

Even if we were to somehow track the now empty time series between fetches, I don´t think we'd be able to do it across restarts, so you'd eventually end up in the same situation.