GCP Logging - logs are segmentated randomly

zubmic commented 1 year ago

Describe the bug Some of the Falco logs are separated into multiple lines when pushed to GCP Cloud Logging. The logs in the container are intact.

How to reproduce it

Deploy Falco via GCP Market into GKE cluster.
Create Cloud Logging sink to gather logs from Falco
Open Logs Explorer and check the logs.
After some time broken logs will arrive.

Expected behaviour Logs are visible in the same manner as in the container itself.

Screenshots github-issue

The split log is marked with red rectangles. Same log is a single line in Falco container.

Environment

Falco version: Fri Sep 15 07:52:33 2023: Falco version: 0.34.1 (x86_64) Fri Sep 15 07:52:33 2023: Falco initialized with configuration file: /etc/falco/falco.yaml Falco version: 0.34.1 Libs version: 0.10.4 Plugin API: 2.0.0 Engine: 16 Driver: API version: 3.0.0 Schema version: 2.0.0 Default driver: 4.0.0+driver
System info: { "machine": "x86_64", "nodename": "gke-asia-east1-001-workload-16-b191d2cc-bym8", "release": "5.15.0-1036-gke", "sysname": "Linux", "version": "41-Ubuntu SMP Wed Jun 7 04:23:11 UTC 2023" }
Cloud provider or hardware configuration: Google Cloud Platform
OS: PRETTY_NAME="Debian GNU/Linux 11 (bullseye)" NAME="Debian GNU/Linux" VERSION_ID="11" VERSION="11 (bullseye)" VERSION_CODENAME=bullseye ID=debian HOME_URL="https://www.debian.org/" SUPPORT_URL="https://www.debian.org/support" BUG_REPORT_URL="https://bugs.debian.org/"
Kernel: Linux gke-asia-east1-001-workload-16-b191d2cc-bym8 5.15.0-1036-gke 41-Ubuntu SMP Wed Jun 7 04:23:11 UTC 2023 x86_64 GNU/Linux
Installation method: Kubernetes (GKE), Falco deployed via GCP Market Applications.

Additional context I tried changing Falco config, to adjust the throttling settings or buffering but it didn't solve the issue. If it can be solved by changing the config I will appreciate it greatly if someone points me how to fix it.

Andreagit97 commented 1 year ago

ei that's interesting, thank you for reporting! do you know if all Falco versions are affected by this behavior? I've seen that you are using Falco 0.34.1, do you mind to try also Falco 0.35.1?

I'm not 100% sure this is a problem in Falco but we could investigate

zubmic commented 1 year ago

ei that's interesting, thank you for reporting! do you know if all Falco versions are affected by this behavior? I've seen that you are using Falco 0.34.1, do you mind to try also Falco 0.35.1?

I'm not 100% sure this is a problem in Falco but we could investigate

Hi @Andreagit97 ! 👋 Thanks for your reply. No, I haven't tried other versions. It won't hurt to try the newer release and see if that happens as well.

zubmic commented 12 months ago

Sorry for taking so long, I've been away for two weeks.

I deployed Falco via Helm chart, just as suggested in the official documentation. The pods are running, the logs are gathered and collected in GCP Cloud Logging, but even on version 0.36 the issue is still there.

@Andreagit97 , can you investigate that please? 🙏

Andreagit97 commented 12 months ago

ok thank you for the feedback, we will try to take a look!

zubmic commented 12 months ago

Awesome! Don't hesitate to reach out via this thread. I'll be checking it daily. 🙂

poiana commented 9 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

zubmic commented 8 months ago

Hi @Andreagit97 👋 Any luck with this one?

Andreagit97 commented 8 months ago

ei @zubmic, I've tried to reproduce the issue without success...

I've deployed Falco with the latest helm chart in the following way (the Falco shipped in the market place seems not updated, we need to update it!):

helm install falco falcosecurity/falco \
    --set driver.kind=modern-bpf  --namespace falco --create-namespace

and then I observed the logs for almost 2 hours using the following query:

resource.type="k8s_container"
resource.labels.project_id=...
resource.labels.location=...
resource.labels.cluster_name=...
resource.labels.namespace_name="falco"

Maybe I'm missing something... BTW I'm pretty convinced that this is not a Falco issue if the logs are not fragmented in the Falco container... In the end, the Logs Explorer should only use the output of the container so Falco shouldn't be involved in how logs are rendered in the Logs Explorer :thinking:

@Issif have you ever seen something similar in GKE environments?

Issif commented 8 months ago

Seems more related to the way GKE scraps or parses the logs.

I'm curious, if you enable the json format for the Falco output, do you get the same result?

zubmic commented 8 months ago

I checked again and changed the output format to json. The outcome is still the same, the logs get fragmented.

poiana commented 7 months ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh with /remove-lifecycle rotten.

Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle rotten

Andreagit97 commented 7 months ago

/remove-lifecycle rotten

poiana commented 4 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

poiana commented 3 months ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh with /remove-lifecycle rotten.

Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle rotten

Andreagit97 commented 3 months ago

/remove-lifecycle rotten to be honest I'm not sure what we can do here on the Falco side... @Issif do you have other ideas?

Issif commented 3 months ago

This is a tricky question. There's a max length for sure, not by character but by size of the payload: https://cloud.google.com/logging/quotas. I don't know which process is in charge of splitting the lines to fit in the max size but I hope it does that correctly and the multiple parts are consistent with the initial payload when merged.

I don't know how to solve that, if the log line was in JSON, we could:

try to UnMarshal it into a struct
if it fails (because the JSON is invalid, as truncated), keep the string into a buffer
wait for a next Unmarshal failure
concatenate the strings (the current and the one in the buffer)
repeat the process
drop old strings from the buffer periodically

poiana commented 3 weeks ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

falcosecurity / falco

GCP Logging - logs are segmentated randomly #2811