flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.68k stars 639 forks source link

[Docs] Document logging inside Python tasks running on a remote cluster #4830

Open neverett opened 8 months ago

neverett commented 8 months ago

Description

Question from Slack:

Is there a convenient way of logging inside normal python tasks running on a remote cluster? Specifically just logging to stdout or somehow accessible on the Flyte web interface

Response:

In order to do this, you need to use the flytekit logger and set the level to 20. To do this in ImageSpec (or add to Dockerfile):

 ImageSpec(python_version="3.9.13",
                        packages=[flytekit_package, flytekitplugins_spark_package],
                        registry="localhost:30000",
                        apt_packages=["git","wget"],
                        env={"FLYTE_SDK_LOGGING_LEVEL": "20"},)

To log, import logger from flytekit and call in the task, making sure to set up the env variable: https://github.com/flyteorg/flytekit/blob/f721eef474bb04ec2df581a690fb703e7c2b7956/flytekit/deck/deck.py#L6 https://github.com/flyteorg/flytekit/blob/f721eef474bb04ec2df581a690fb703e7c2b7956/flytekit/deck/deck.py#L149

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

dosubot[bot] commented 8 months ago

🤖

Hey @neverett! Great to see you back with another insightful question. Hope you're doing well!

To set up logging inside normal Python tasks running on a remote cluster in Flyte, you can use the built-in logging module in Python. The logs will be sent to the logging system configured in Flyte, and they will be accessible from the Flyte web interface.

Here's an example of how to log from your tasks:

import logging

def my_task():
    logging.info("This is an info message")
    logging.warning("This is a warning message")
    logging.error("This is an error message")

In your specific case, you mentioned setting the FLYTE_SDK_LOGGING_LEVEL environment variable to 20. This environment variable is used to determine the minimum log level to log. The different logging levels are PanicLevel, FatalLevel, ErrorLevel, WarnLevel, InfoLevel, and DebugLevel. The higher the level, the more severe the log message is. For example, if the FLYTE_SDK_LOGGING_LEVEL is set to WarnLevel, then all log messages with a level of WarnLevel and above (i.e., ErrorLevel, FatalLevel, and PanicLevel) will be logged.

You can find more information about this in the flytestdlib/logger/config.go file.

For GCP (Google Cloud Platform) users, Flyte can be configured to use GCP Stackdriver for logging. You need to update the task_logs.yaml configuration file to enable Stackdriver logging and specify your GCP project ID. Here's how you can refine the configuration:

plugins:
  logs:
    kubernetes-enabled: false
    stackdriver-enabled: true
    gcp-project: your-gcp-project-id  # Replace with your actual GCP project ID
    stackdriver-logresourcename: k8s_container

Remember to replace your-gcp-project-id with your actual GCP project ID in the configuration file. You can find more information about this in the kustomize/overlays/eks/flyte/config/propeller/plugins/task_logs.yaml and kustomize/overlays/gcp/flyte/config/propeller/plugins/task_logs.yaml files.

I hope this helps! If you have any other questions, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.