crowdsecurity / crowdsec

CrowdSec - the open-source and participative security solution offering crowdsourced protection against malicious IPs and access to the most advanced real-world CTI.
https://crowdsec.net
MIT License
9.03k stars 467 forks source link

[cscli] Routine order needs to be reversed #2725

Open LaurenceJJones opened 10 months ago

LaurenceJJones commented 10 months ago

What would you like to be added?

Cscli explain spins up a crowdsec process with the -no-api flag, however, this can cause explain to run slow if there is acutally no LAPI running. This issue is further exasperated with the http-retry since crowdsec will be blocking until all 5 timeouts are reached.

Cscli explain should do a preflight check to see if LAPI is configured and is reachable if so the -no-api flag should be appended. If not the -no-api flag should be omitted.

/kind enhancement

Why is this needed?

Cscli has the ability to know the LAPI and if it is reachable we can use this to allow explain to run a crowdsec instance with a LAPI temporarily.

github-actions[bot] commented 10 months ago

@LaurenceJJones: Thanks for opening an issue, it is currently awaiting triage.

In the meantime, you can:

  1. Check Crowdsec Documentation to see if your issue can be self resolved.
  2. You can also join our Discord.
  3. Check Releases to make sure your agent is on the latest version.
Details I am a bot created to help the [crowdsecurity](https://github.com/crowdsecurity) developers manage community feedback and contributions. You can check out my [manifest file](https://github.com/crowdsecurity/crowdsec/blob/master/.github/governance.yml) to understand my behavior and what I can do. If you want to use this for your project, you can check out the [BirthdayResearch/oss-governance-bot](https://github.com/BirthdayResearch/oss-governance-bot) repository.
github-actions[bot] commented 10 months ago

@LaurenceJJones: There are no 'kind' label on this issue. You need a 'kind' label to start the triage process.

Details I am a bot created to help the [crowdsecurity](https://github.com/crowdsecurity) developers manage community feedback and contributions. You can check out my [manifest file](https://github.com/crowdsecurity/crowdsec/blob/master/.github/governance.yml) to understand my behavior and what I can do. If you want to use this for your project, you can check out the [BirthdayResearch/oss-governance-bot](https://github.com/BirthdayResearch/oss-governance-bot) repository.
LaurenceJJones commented 10 months ago

Attaching to stderr pipe you can see what is happening under the hood when the LAPI is not available

    output, err := crowdsecCmd.StderrPipe()
    reader := bufio.NewReader(output)
    go func() {
        for {
            line, err := reader.ReadString('\n')
            if err != nil && errors.Is(err, io.EOF) {
                break
            }
            fmt.Print(line)
        }
    }()
    crowdsecCmd.Run()
time="2024-01-11 12:50:24" level=info msg="Adding file /tmp/cscli_explain3555705569/cscli_test_tmp.log to filelist" type="file:///tmp/cscli_explain3555705569/cscli_test_tmp.log"
time="2024-01-11 12:50:24" level=info msg="Starting processing data"
time="2024-01-11 12:50:24" level=info msg="reading /tmp/cscli_explain3555705569/cscli_test_tmp.log at once" type="file:///tmp/cscli_explain3555705569/cscli_test_tmp.log"
time="2024-01-11 12:50:24" level=warning msg="Acquisition is finished, shutting down"
time="2024-01-11 12:50:24" level=error msg="error while performing request: dial tcp 127.0.0.1:8080: connect: connection refused; 4 retries left"
time="2024-01-11 12:50:24" level=info msg="retrying in 22 seconds (attempt 2 of 5)"
time="2024-01-11 12:50:25" level=info msg="Killing parser routines"
time="2024-01-11 12:50:25" level=info msg="Killing parser routines"
time="2024-01-11 12:50:26" level=info msg="Bucket routine exiting"
time="2024-01-11 12:50:26" level=info msg="Bucket routine exiting"
time="2024-01-11 12:50:46" level=error msg="error while performing request: dial tcp 127.0.0.1:8080: connect: connection refused; 3 retries left"
time="2024-01-11 12:50:46" level=info msg="retrying in 41 seconds (attempt 3 of 5)"
time="2024-01-11 12:51:27" level=error msg="error while performing request: dial tcp 127.0.0.1:8080: connect: connection refused; 2 retries left"
time="2024-01-11 12:51:27" level=info msg="retrying in 51 seconds (attempt 4 of 5)"
time="2024-01-11 12:52:18" level=error msg="error while performing request: dial tcp 127.0.0.1:8080: connect: connection refused; 1 retries left"
time="2024-01-11 12:52:18" level=info msg="retrying in 70 seconds (attempt 5 of 5)"
time="2024-01-11 12:53:28" level=error msg="error while performing request: dial tcp 127.0.0.1:8080: connect: connection refused; 1 retries left"
time="2024-01-11 12:53:28" level=info msg="retrying in 0 seconds (attempt 2 of 2)"
time="2024-01-11 12:53:28" level=error msg="error while performing request: dial tcp 127.0.0.1:8080: connect: connection refused; 4 retries left"
time="2024-01-11 12:53:28" level=info msg="retrying in 10 seconds (attempt 2 of 5)"
time="2024-01-11 12:53:38" level=error msg="error while performing request: dial tcp 127.0.0.1:8080: connect: connection refused; 3 retries left"
time="2024-01-11 12:53:38" level=info msg="retrying in 29 seconds (attempt 3 of 5)"
time="2024-01-11 12:54:07" level=error msg="error while performing request: dial tcp 127.0.0.1:8080: connect: connection refused; 2 retries left"
time="2024-01-11 12:54:07" level=info msg="retrying in 50 seconds (attempt 4 of 5)"
time="2024-01-11 12:54:57" level=error msg="error while performing request: dial tcp 127.0.0.1:8080: connect: connection refused; 1 retries left"
time="2024-01-11 12:54:57" level=info msg="retrying in 63 seconds (attempt 5 of 5)"
time="2024-01-11 12:56:00" level=fatal msg="starting outputs error : authenticate watcher (6df9e06234fa428d9ba8e78f9b557c4axNHfFR9BmLcUlSfq): Post \"http://127.0.0.1:8080/v1/watchers/login\": performing jwt auth:dial tcp 127.0.0.1:8080: connect: connection refused"
time="2024-01-11 12:56:00" level=fatal msg="starting outputs error : authenticate watcher (6df9e06234fa428d9ba8e78f9b557c4axNHfFR9BmLcUlSfq): Post \"http://127.0.0.1:8080/v1/watchers/login\": performing jwt auth:dial tcp 127.0.0.1:8080: connect: connection refused"
FATA[2024-01-11T12:56:00Z] unable to load parser dump result: open /tmp/cscli_explain3555705569/parser-dump.yaml: no such file or directory

The fatal causes no dumps to be written

LaurenceJJones commented 10 months ago

Interesting if the line is whitelisted then it does fail but aborts earlier with only a warning instead of a fatal

time="2024-01-11 13:00:51" level=info msg="Adding file /tmp/cscli_explain759007184/cscli_test_tmp.log to filelist" type="file:///tmp/cscli_explain759007184/cscli_test_tmp.log"
time="2024-01-11 13:00:51" level=info msg="Starting processing data"
time="2024-01-11 13:00:51" level=info msg="reading /tmp/cscli_explain759007184/cscli_test_tmp.log at once" type="file:///tmp/cscli_explain759007184/cscli_test_tmp.log"
time="2024-01-11 13:00:51" level=error msg="error while performing request: dial tcp 127.0.0.1:8080: connect: connection refused; 4 retries left"
time="2024-01-11 13:00:51" level=info msg="retrying in 23 seconds (attempt 2 of 5)"
time="2024-01-11 13:00:51" level=warning msg="Acquisition is finished, shutting down"
time="2024-01-11 13:00:52" level=info msg="Killing parser routines"
time="2024-01-11 13:00:52" level=info msg="Killing parser routines"
time="2024-01-11 13:00:53" level=info msg="Bucket routine exiting"
time="2024-01-11 13:00:53" level=info msg="Bucket routine exiting"
time="2024-01-11 13:00:57" level=warning msg="Outputs didn't finish in time, some events may have not been flushed"
line: 1.2.3.4 - - [22/Sep/2023:12:26:01 +0200] "\x03\x00\x00/*\xE0\x00\x00\x00\x00\x00Cookie: mstshash=Administr" 400 166 "-" "-"
    ├ s00-raw
    |   ├ 🟢 crowdsecurity/non-syslog (+5 ~8)
    ├ s01-parse
    |   ├ 🟢 crowdsecurity/nginx-logs (+19 ~2)
    ├ s02-enrich
    |   ├ 🟢 crowdsecurity/dateparse-enrich (+2 ~4 [whitelisted])
    |   ├ 🟢 crowdsecurity/geoip-enrich (+10)
    |   ├ 🟢 crowdsecurity/http-logs (+7)
    |   ├ 🟢 crowdsecurity/jellyfin-whitelist (unchanged)
    |   ├ 🟢 crowdsecurity/nextcloud-whitelist (unchanged)
    |   ├ 🟢 crowdsecurity/whitelists (unchanged)
    |   └ 🟢 crowdsecurity/zz (~2 -19)
    └-------- parser success, ignored by whitelist (private ipv4/ipv6 ip/ranges) 🟢
LaurenceJJones commented 10 months ago

The difference between a whitelisted event and non whitelisted is this for loop never returns

https://github.com/crowdsecurity/crowdsec/blob/260f5a79924f7d61fca96216bdc0cc28a2f577a0/pkg/leakybucket/bucket.go#L342-L349

time="2024-01-11 13:34:27" level=debug msg="acquisition is finished, wait for parser/bucket/ouputs."
time="2024-01-11 13:34:27" level=info msg="Killing parser routines"
time="2024-01-11 13:34:27" level=info msg="Killing parser routines"
time="2024-01-11 13:34:28" level=debug msg="parsers is done"
time="2024-01-11 13:34:29" level=info msg="Bucket routine exiting"
time="2024-01-11 13:34:29" level=info msg="Bucket routine exiting"
time="2024-01-11 13:34:29" level=debug msg="Bucket externally killed, return" bucket_id=winter-forest cfg=cool-water name=adam-ah/http-probing partition=ea964f61d6a686dbbd59206f7ac80190ff6e2873
time="2024-01-11 13:34:29" level=debug msg="Bucket externally killed, return" bucket_id=icy-darkness cfg=sparkling-sound name=crowdsecurity/http-probing partition=7dce20630da2ccc125e674bd69b8bb9af8077d3c

A killed bucket will never have an overflow because it was the first event?

LaurenceJJones commented 10 months ago

Okay issue has been fully debugged:

The outputs routine is never started if the LAPI is unavailable so the channel is blocking. We are internally discussing the pros and cons of NOT forcing a first authentication on startup due to the fact that if the LP has issues nothing will be processed from acquisition until it has been authed.