flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.37k stars 571 forks source link

[BUG] Log Links should work with CloudWatch FluentD Out Of the Box #2635

Open EngHabu opened 2 years ago

EngHabu commented 2 years ago

Describe the bug

As it stands, users wishing to enable CloudWatch logs will need to go through the AWS Process to do so here and then manually modify the fluent-bit-config configmap to alter the generated log stream to match what Flyte expects.

kubectl edit cm -n amazon-cloudwatch fluent-bit-config 
[OUTPUT]
    Name                cloudwatch_logs
    Match               application.*
    region              ${AWS_REGION}
    log_group_name      /aws/containerinsights/${CLUSTER_NAME}/application
    log_stream_prefix   var.
    auto_create_group   true
    extra_user_agent    container-insights

And then use this configmap for flyte:

  task_logs.yaml: |
    plugins:
      logs:
        cloudwatch-template-uri: 'https://{vars.region}.console.aws.amazon.com/cloudwatch/home?region={vars.region}#logsV2:log-groups/log-group/$252Faws$252Fcontainerinsights$252F<log group name>$252Fapplication$3FlogStreamNameFilter$3Dvar.application.var.log.containers.{{ .podName }}_{{ .namespace }}_{{ .containerName }}'

We should change the default template defined here to be: https://console.aws.amazon.com/cloudwatch/home?region=%s#logsV2:log-groups/log-group/%s$3FlogStreamNameFilter=var.log.containers.{{ .podName }}_{{ .namespace }}_{{ .containerName }}

Expected behavior

  1. Follow AWS Guide to deploy FluentD: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-setup-logs-FluentBit.html
  2. Enable CloudWatch logs in Flyte:
      task_logs.yaml: |
        plugins:
          logs:
            cloudwatch-enabled: true
            cloudwatch-log-group: 'bv-ml-pipelines'
            cloudwatch-region: 'us-east-1'
            kubernetes-enabled: false
  3. SUCCESS ✔️

Additional context to reproduce

No response

Screenshots

No response

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

Azanul commented 1 year ago

I'd like to work on this.

Azanul commented 1 year ago

take

Azanul commented 1 year ago

self-assign

Azanul commented 1 year ago

@samhita-alla here

Azanul commented 1 year ago

self-assign

Smartmind12 commented 1 year ago

@samhita-alla Kindly review the PR and let me know if any changes are required...Thanks!

andrusha commented 1 year ago

This assumes that the cluster is going to be deployed with FluentBit by-default, while FluentD is valid and recommended option, to which the older configuration is pointing to. This change will not be applicable for the all users.

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-setup-logs-FluentBit.html

EngHabu commented 1 year ago

@andrusha do you mind elaborating on this? Are you commenting on the discussion we are having on the PR? https://github.com/flyteorg/flyteplugins/pull/293#issuecomment-1328066538

andrusha commented 1 year ago

I was answering to the premise of this ticket. The log link is not broken if you use FluentD out of the box, it’s only broken if you’re using FluentBit, which is a default for EKS cluster. FluentBit can be ran in FluentD compatibility mode as well. Overriding default like will inevitably break it for the other group of users.

I think this is not a bug and supporting multiple templates is better solution here. Supporting FluentBit is related to flyteorg/flyteplugins#293, as it requires node name.

On 26 Nov 2022, at 16:32, Haytham Abuelfutuh @.***> wrote:

@.***(https://github.com/andrusha) do you mind elaborating on this? Are you commenting on the discussion we are having on the PR? flyteorg/flyteplugins#293 (comment)

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

EngHabu commented 1 year ago

Aha, I understand, I agree changing the default can break other people, do you know for a fact that FlutentBit in compatibility mode works out of the box? I vaguely recall hostname was still a requirement though, no?

Perhaps a better approach is to:

  1. Still add support for a host name template parameter
  2. Instead of changing the default, add another CloudWatchV2Enabled flag that default to the new version format...
  3. Add a "Enabling Cloud Provider-Native logs" where we can elaborate on templating for URLs, StackDriver, CloudWatch, FluentD and FluentBit.
github-actions[bot] commented 2 months ago

Hello 👋, this issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will engage on it to decide if it is still applicable. Thank you for your contribution and understanding! 🙏