Open LaurenceJJones opened 10 months ago
@LaurenceJJones: Thanks for opening an issue, it is currently awaiting triage.
In the meantime, you can:
@LaurenceJJones: There are no 'kind' label on this issue. You need a 'kind' label to start the triage process.
/kind feature
/kind enhancement
/kind bug
/kind packaging
Attaching to stderr pipe you can see what is happening under the hood when the LAPI is not available
output, err := crowdsecCmd.StderrPipe()
reader := bufio.NewReader(output)
go func() {
for {
line, err := reader.ReadString('\n')
if err != nil && errors.Is(err, io.EOF) {
break
}
fmt.Print(line)
}
}()
crowdsecCmd.Run()
time="2024-01-11 12:50:24" level=info msg="Adding file /tmp/cscli_explain3555705569/cscli_test_tmp.log to filelist" type="file:///tmp/cscli_explain3555705569/cscli_test_tmp.log"
time="2024-01-11 12:50:24" level=info msg="Starting processing data"
time="2024-01-11 12:50:24" level=info msg="reading /tmp/cscli_explain3555705569/cscli_test_tmp.log at once" type="file:///tmp/cscli_explain3555705569/cscli_test_tmp.log"
time="2024-01-11 12:50:24" level=warning msg="Acquisition is finished, shutting down"
time="2024-01-11 12:50:24" level=error msg="error while performing request: dial tcp 127.0.0.1:8080: connect: connection refused; 4 retries left"
time="2024-01-11 12:50:24" level=info msg="retrying in 22 seconds (attempt 2 of 5)"
time="2024-01-11 12:50:25" level=info msg="Killing parser routines"
time="2024-01-11 12:50:25" level=info msg="Killing parser routines"
time="2024-01-11 12:50:26" level=info msg="Bucket routine exiting"
time="2024-01-11 12:50:26" level=info msg="Bucket routine exiting"
time="2024-01-11 12:50:46" level=error msg="error while performing request: dial tcp 127.0.0.1:8080: connect: connection refused; 3 retries left"
time="2024-01-11 12:50:46" level=info msg="retrying in 41 seconds (attempt 3 of 5)"
time="2024-01-11 12:51:27" level=error msg="error while performing request: dial tcp 127.0.0.1:8080: connect: connection refused; 2 retries left"
time="2024-01-11 12:51:27" level=info msg="retrying in 51 seconds (attempt 4 of 5)"
time="2024-01-11 12:52:18" level=error msg="error while performing request: dial tcp 127.0.0.1:8080: connect: connection refused; 1 retries left"
time="2024-01-11 12:52:18" level=info msg="retrying in 70 seconds (attempt 5 of 5)"
time="2024-01-11 12:53:28" level=error msg="error while performing request: dial tcp 127.0.0.1:8080: connect: connection refused; 1 retries left"
time="2024-01-11 12:53:28" level=info msg="retrying in 0 seconds (attempt 2 of 2)"
time="2024-01-11 12:53:28" level=error msg="error while performing request: dial tcp 127.0.0.1:8080: connect: connection refused; 4 retries left"
time="2024-01-11 12:53:28" level=info msg="retrying in 10 seconds (attempt 2 of 5)"
time="2024-01-11 12:53:38" level=error msg="error while performing request: dial tcp 127.0.0.1:8080: connect: connection refused; 3 retries left"
time="2024-01-11 12:53:38" level=info msg="retrying in 29 seconds (attempt 3 of 5)"
time="2024-01-11 12:54:07" level=error msg="error while performing request: dial tcp 127.0.0.1:8080: connect: connection refused; 2 retries left"
time="2024-01-11 12:54:07" level=info msg="retrying in 50 seconds (attempt 4 of 5)"
time="2024-01-11 12:54:57" level=error msg="error while performing request: dial tcp 127.0.0.1:8080: connect: connection refused; 1 retries left"
time="2024-01-11 12:54:57" level=info msg="retrying in 63 seconds (attempt 5 of 5)"
time="2024-01-11 12:56:00" level=fatal msg="starting outputs error : authenticate watcher (6df9e06234fa428d9ba8e78f9b557c4axNHfFR9BmLcUlSfq): Post \"http://127.0.0.1:8080/v1/watchers/login\": performing jwt auth:dial tcp 127.0.0.1:8080: connect: connection refused"
time="2024-01-11 12:56:00" level=fatal msg="starting outputs error : authenticate watcher (6df9e06234fa428d9ba8e78f9b557c4axNHfFR9BmLcUlSfq): Post \"http://127.0.0.1:8080/v1/watchers/login\": performing jwt auth:dial tcp 127.0.0.1:8080: connect: connection refused"
FATA[2024-01-11T12:56:00Z] unable to load parser dump result: open /tmp/cscli_explain3555705569/parser-dump.yaml: no such file or directory
The fatal causes no dumps to be written
Interesting if the line is whitelisted then it does fail but aborts earlier with only a warning instead of a fatal
time="2024-01-11 13:00:51" level=info msg="Adding file /tmp/cscli_explain759007184/cscli_test_tmp.log to filelist" type="file:///tmp/cscli_explain759007184/cscli_test_tmp.log"
time="2024-01-11 13:00:51" level=info msg="Starting processing data"
time="2024-01-11 13:00:51" level=info msg="reading /tmp/cscli_explain759007184/cscli_test_tmp.log at once" type="file:///tmp/cscli_explain759007184/cscli_test_tmp.log"
time="2024-01-11 13:00:51" level=error msg="error while performing request: dial tcp 127.0.0.1:8080: connect: connection refused; 4 retries left"
time="2024-01-11 13:00:51" level=info msg="retrying in 23 seconds (attempt 2 of 5)"
time="2024-01-11 13:00:51" level=warning msg="Acquisition is finished, shutting down"
time="2024-01-11 13:00:52" level=info msg="Killing parser routines"
time="2024-01-11 13:00:52" level=info msg="Killing parser routines"
time="2024-01-11 13:00:53" level=info msg="Bucket routine exiting"
time="2024-01-11 13:00:53" level=info msg="Bucket routine exiting"
time="2024-01-11 13:00:57" level=warning msg="Outputs didn't finish in time, some events may have not been flushed"
line: 1.2.3.4 - - [22/Sep/2023:12:26:01 +0200] "\x03\x00\x00/*\xE0\x00\x00\x00\x00\x00Cookie: mstshash=Administr" 400 166 "-" "-"
├ s00-raw
| ├ 🟢 crowdsecurity/non-syslog (+5 ~8)
├ s01-parse
| ├ 🟢 crowdsecurity/nginx-logs (+19 ~2)
├ s02-enrich
| ├ 🟢 crowdsecurity/dateparse-enrich (+2 ~4 [whitelisted])
| ├ 🟢 crowdsecurity/geoip-enrich (+10)
| ├ 🟢 crowdsecurity/http-logs (+7)
| ├ 🟢 crowdsecurity/jellyfin-whitelist (unchanged)
| ├ 🟢 crowdsecurity/nextcloud-whitelist (unchanged)
| ├ 🟢 crowdsecurity/whitelists (unchanged)
| └ 🟢 crowdsecurity/zz (~2 -19)
└-------- parser success, ignored by whitelist (private ipv4/ipv6 ip/ranges) 🟢
The difference between a whitelisted event and non whitelisted is this for loop never returns
time="2024-01-11 13:34:27" level=debug msg="acquisition is finished, wait for parser/bucket/ouputs."
time="2024-01-11 13:34:27" level=info msg="Killing parser routines"
time="2024-01-11 13:34:27" level=info msg="Killing parser routines"
time="2024-01-11 13:34:28" level=debug msg="parsers is done"
time="2024-01-11 13:34:29" level=info msg="Bucket routine exiting"
time="2024-01-11 13:34:29" level=info msg="Bucket routine exiting"
time="2024-01-11 13:34:29" level=debug msg="Bucket externally killed, return" bucket_id=winter-forest cfg=cool-water name=adam-ah/http-probing partition=ea964f61d6a686dbbd59206f7ac80190ff6e2873
time="2024-01-11 13:34:29" level=debug msg="Bucket externally killed, return" bucket_id=icy-darkness cfg=sparkling-sound name=crowdsecurity/http-probing partition=7dce20630da2ccc125e674bd69b8bb9af8077d3c
A killed bucket will never have an overflow because it was the first event?
Okay issue has been fully debugged:
The outputs routine is never started if the LAPI is unavailable so the channel is blocking. We are internally discussing the pros and cons of NOT forcing a first authentication on startup due to the fact that if the LP has issues nothing will be processed from acquisition until it has been authed.
What would you like to be added?
Cscli explain spins up a crowdsec process with the
-no-api
flag, however, this can cause explain to run slow if there is acutally no LAPI running. This issue is further exasperated with thehttp-retry
since crowdsec will be blocking until all 5 timeouts are reached.Cscli explain
should do a preflight check to see if LAPI is configured and is reachable if so the-no-api
flag should be appended. If not the-no-api
flag should be omitted./kind enhancement
Why is this needed?
Cscli has the ability to know the LAPI and if it is reachable we can use this to allow explain to run a crowdsec instance with a LAPI temporarily.