cilium / cilium

eBPF-based Networking, Security, and Observability
https://cilium.io
Apache License 2.0
20.25k stars 2.97k forks source link

Cilium Envoy not producing HTTP access logs #31357

Open jcrowthe opened 8 months ago

jcrowthe commented 8 months ago

Is there an existing issue for this?

What happened?

No envoy access logs are produced. Instead, in the place of each log line is the following message:

[2024-03-12 16:43:03.658][14][info][filter] [cilium/conntrack.cc:229] cilium.bpf_metadata: IPv4 conntrack map global lookup failed: Success

This line in and of itself has been determined to be noise in other git issues and slack conversations, however the issue I am highlighting here is that there are no envoy access logs at all, with this line being found in place of each expected envoy access log line. This is problematic for bringing Cilium Ingress or Cilium's Gateway API support to production.

Cilium Version

v1.15.1

Kernel Version

Tested on:

Kubernetes Version

v1.28.2 (Kubeadm) v1.29.0 (EKS) v1.29.1 (Talos/Sidero Omni)

Regression

No response

Sysdump

cilium-sysdump-20240312-140158.zip

Relevant log output

[2024-03-12 15:37:16.793][15][info][filter] [cilium/conntrack.cc:229] cilium.bpf_metadata: IPv4 conntrack map global lookup failed: Success
[2024-03-12 15:37:37.674][15][info][filter] [cilium/conntrack.cc:229] cilium.bpf_metadata: IPv4 conntrack map global lookup failed: Success
[2024-03-12 15:37:58.848][15][info][filter] [cilium/conntrack.cc:229] cilium.bpf_metadata: IPv4 conntrack map global lookup failed: Success
[2024-03-12 15:38:20.099][14][info][filter] [cilium/conntrack.cc:229] cilium.bpf_metadata: IPv4 conntrack map global lookup failed: Success
[2024-03-12 15:38:26.064][14][info][filter] [cilium/conntrack.cc:229] cilium.bpf_metadata: IPv4 conntrack map global lookup failed: Success
[2024-03-12 15:38:42.126][14][info][filter] [cilium/conntrack.cc:229] cilium.bpf_metadata: IPv4 conntrack map global lookup failed: Success
[2024-03-12 15:39:02.399][15][info][filter] [cilium/conntrack.cc:229] cilium.bpf_metadata: IPv4 conntrack map global lookup failed: Success
[2024-03-12 15:39:05.302][15][info][filter] [cilium/conntrack.cc:229] cilium.bpf_metadata: IPv4 conntrack map global lookup failed: Success
[2024-03-12 15:39:23.660][15][info][filter] [cilium/conntrack.cc:229] cilium.bpf_metadata: IPv4 conntrack map global lookup failed: Success

Anything else?

EKS values.yaml file:

nodePort:
  enabled: true
gatewayAPI:
  enabled: true
envoy:
  enabled: true
cni:
  chainingMode: aws-cni
  exclusive: false
enableIPv4Masquerade: false
routingMode: native
endpointRoutes:
  enabled: true
ingressController:
  enabled: true
  loadbalancerMode: shared
  default: true

Cilium Users Document

Code of Conduct

mhofstetter commented 8 months ago

Duplicate of https://github.com/cilium/cilium/issues/30667?

jcrowthe commented 8 months ago

@mhofstetter I believe this to be the source of why issue #30667 exists. Envoy access logs are already configured to be printed out to stdout. There are also helm configuration options for both envoy.log.format and envoy.log.path already, indicating that it was intended for access logs to be available. I believe the issue is that there is a bug in the flow which causes an error to be printed instead of the intended access log.

As an indication of this being a bug, there appears to be a 1:1 correlation between when an envoy access log should be printed, and cilium.bpf_metadata: IPv4 conntrack map global lookup failed: Success is printed. This indicates to me that if this error message was addressed (not suppressed, but addressed) that envoy access logs would likely appear.

sayboras commented 8 months ago

Envoy access logs are already configured to be printed out to stdout

I don't think the access log is configured right now. The two params envoy.log.format and envoy.log.path are for envoy log only.

youngnick commented 7 months ago

Yes, it seems like our documentation is not very clear:

These apparent error messages are actually really debug messages that were added at info instead - these have actually been pushed back to debug in main.

So, I have to say here that I'm sorry, we obviously have done a poor job making all of this clear.

That said, @jcrowthe, have you used Hubble to look for HTTP traffic passing through Envoy?

If not, are you looking for any specific info from the Envoy access logs, or just the usual CLF-style details?

It feels like the next steps here are probably to update the docs to make the "use Hubble" direction much more clear, but I'm also curious as to whether there's something we're missing there.

jcrowthe commented 7 months ago

Hey Nick, thanks for the response. For context, I was recommended to open a support ticket after starting this git issue, which you may find here.

I have two situations in mind. The first is for enterprises, which the support request above also covers. In this use case, we need envoy access logs to be output to stdout in order that they may be ingested by a logging mechanism and shipped off to something like Splunk. Is there a simple way to do this with Hubble?

The second use case is maybe more for the benefit of the community. There are cases where someone installs Cilium and wants to use Cilium Ingress/Gateway API as an easy way to setup ingress without needing another ingress controller. In this model though, Hubble should be completely optional. Since viewing envoy access logs is rather critical to debugging ingress flow errors, I would expect there to be a way to view these logs without having to setup Hubble as well.

In both of these situations, having the option behind a feature flag to enable access logging to stdout would be at the very least helpful, at ad most, critical to adoption. Let me know your thoughts!

youngnick commented 6 months ago

Sorry about the delay here @jcrowthe, I was away on leave.

It's totally possible to add support for shipping Envoy's access logs, but I think that anyone who does will not be happy with the correlation effort they need to do to see what client accessed what Pod - because the access logs won't dereference which Pod has which IP, so you will need a historical record of what Pod had what IP at what time. Maybe Splunk or other access-log sinks can do that for you, but it seems like a lot of work - which is why we recommend using Hubble to do that (since Hubble has all that information at the time the access is logged).

That said, for you and any other folks watching this, we'd appreciate knowing if you would prefer the less-useful raw log switch anyway.

tommasopozzetti commented 3 months ago

@youngnick I just found this issue while looking for this same feature. I think having the ability (maybe not necessarily on by default) to log access logs to stdout of envoy for normal log ingestion would absolutely be fantastic. While real time debugging via Hubble is great and faster thanks to the enriched info, having a way to also collect, ingest and ship those logs for longer retention is fundamental in many environments (that might also have ways to collect and store historical data about pod IPs and have ways to cross correlate when needed). This feature would also ease transitions to cilium ingress controller from something like nginx where users are used to having access logs available even if with pod IPs

mrkiani98 commented 3 months ago

Hello everyone,

It seems that we can somehow enable the cilium Hubble flow log in cilium-agent itself (which can show the source, destination, verdict and time) of http requests. We can enable this setting by following values in helm chart (there is no need to enable Hubble-ui or Hubble-relay) and these settings will persist the flow logs in a file which you can ship it to somewhere else to keep them (already tested and working):

(It does not have full http access log but at least better than nothing)

hubble:
  # Here we just enable hubble persisten flow log in cilium-agent.
  # These values are not enabling hubble-ui and hubble-relay (As they're for real-time debugging).
  enabled: true
  export:
    fileMaxSizeMb: 10
    fileMaxBackups: 5
    static:
      enabled: true
      filePath: /var/run/cilium/hubble/events.log
      fieldMask:
        - time
        - source
        - destination
        - verdict
      allowList:
        - '{"verdict":["DROPPED","ERROR"]}'
      denyList:
        - '{"source_pod":["kube-system/"]}'
        - '{"destination_pod":["kube-system/"]}'
chancez commented 3 months ago

Try removing the fieldMask and allowList options. Right now you're only selecting dropped/error flows, and excluding all of the L7 fields from the flows. It's worth noting that the hubble L7 HTTP flows may not contain everything your looking for, but you can report back if there's information missing that you feel might be valuable.

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.