aws / containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
https://aws.amazon.com/about-aws/whats-new/containers/
Other
5.22k stars 321 forks source link

[EKS/Fargate] [Logging]: EKS Fargate logging is missing logs #1450

Open andreiseceavsp opened 3 years ago

andreiseceavsp commented 3 years ago

Community Note

Tell us about your request I configured EKS Fargate logging to output pods logs to Cloudwatch (using cloudwatch_logs) as per below tutorials and although they are working we have missing logs. https://docs.aws.amazon.com/eks/latest/userguide/fargate-logging.html https://aws.amazon.com/blogs/containers/fluent-bit-for-amazon-eks-on-aws-fargate-is-here/

Which service(s) is this request for? Fargate EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? I'm expecting that EKS Fargate with Fluentbit to log consistently in Cloudwatch.

Are you currently working around this issue? No

Additional context

Attachments

Maxwell2022 commented 2 years ago

@andreiseceavsp Did you manage to solve, work around this problem?

andreiseceavsp commented 2 years ago

@andreiseceavsp Did you manage to solve, work around this problem?

I managed to workaround by using the cloudwatch plugin instead of cloudwatch_logs as per this comment

radoslav-stefanov commented 2 years ago

I am having the same problem. EKS 1.21. Tried with the workaround without luck.

booleanbetrayal commented 2 years ago

We are seeing the issue in EKS 1.23 and it happens intermittently with 1 short-lived Pod in particular. We have tried both cloudwatch and cloudwatch_logs plugin and are seeing missing log-groups entirely. This has become a principal concern for us and EKS Fargate reliability.

andreiseceavsp commented 2 years ago

Now I’m worried because we need to upgrade from 1.20 where it was working fine.

andreiseceavsp commented 2 years ago

Looks like there’s at least an option to see fluentbit logs now. Maybe it helps troubleshooting this.

kind: ConfigMap
apiVersion: v1
metadata:
  name: aws-logging
  namespace: aws-observability
  labels:
data:
  # Configuration files: server, input, filters and output
  # ======================================================
  flb_log_cw: "true"  #ships fluent-bit process logs to CloudWatch

  output.conf: |
    [OUTPUT]
        Name cloudwatch
        Match kube.*
        region region-code
        log_group_name fluent-bit-cloudwatch
        log_stream_prefix from-fluent-bit-
        auto_create_group true
booleanbetrayal commented 2 years ago

We believe we may have narrowed this down to Pods with shareProcessNamespace: true. We had been using this to deal with sidecar shutdown in completed Jobs, but are going to have to migrate to a file-watch pattern it looks like. Interestingly enough, several logging frameworks (like DataDog) rely on shareProcessNamespace: true, so definitely a potentially wide-impact issue if it's reproducible in this fashion.

booleanbetrayal commented 2 years ago

FWIW - For the Pods that have logging failures, we do not see any sort of fluent-bit logging after enabling the logging parameter that @andreiseceavsp has pointed out. Pods that initialize logging correctly all log to the fluent-bit log as expected.