flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.39k stars 576 forks source link

Obfuscate sensitive data in TaskConfig #5385

Open rambrus opened 3 months ago

rambrus commented 3 months ago

Motivation: Why do you think this is important?

Flyte dumps all TaskConfig to the UI and logs, including sensitive data (such as access tokens). This is a critical security issue in enterprise environments.

Goal: What should the final outcome look like, ideally?

Flyte admins can mark certain fields as sensitive in the config, sensitive fields are obfuscated throughout the application.

Describe alternatives you've considered

If this approach does not fit the design, perhaps we should consider a more generic approach similar to the one implemented in Databricks. All the values retrieved from the secret store are treated as sensitive data and obfuscated throughout the application.

Propose: Link/Inline OR Additional context

https://docs.databricks.com/en/security/secrets/redaction.html

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

runllm[bot] commented 3 months ago

To obfuscate sensitive data in Flyte's TaskConfig, you can use Flyte's secret management feature. This allows you to securely pass sensitive information to your tasks without exposing it in the UI or logs. Here's an example of how to use Flyte's secret management:

  1. Define Secrets: Use the Secret class to define the secrets required by your task.
  2. Access Secrets: Retrieve the secrets within your task using flytekit.current_context().secrets.get.
import flytekit
from flytekit import Secret, task, workflow

SECRET_NAME = "user_secret"
SECRET_GROUP = "user-info"

@task(secret_requests=[Secret(group=SECRET_GROUP, key=SECRET_NAME)])
def secret_task() -> str:
    secret_val = flytekit.current_context().secrets.get(SECRET_GROUP, SECRET_NAME)
    # Use the secret value securely
    return secret_val

@workflow
def wf() -> str:
    return secret_task()

Would you like more detail on this approach?

Sources:


This is an AI-generated response and your feedback is appreciated! Please leave a šŸ‘ if this is helpful and šŸ‘Ž if it is not.

Mention @runllm to ask follow-up questions. See runllm.com for more!

kumare3 commented 3 months ago

We should not store sensitive information in task config. Secrets should be used for this

rambrus commented 3 months ago

@kumare3 When Databricks TaskConfig is defined, sensitive data (auth) needs to be passed in these fields:

Do you mean these fields should not be exposed in TaskConfig, but the plugin / agent should be responsible to fill them from Secrets?

kumare3 commented 3 months ago

Correct

eapolinario commented 3 months ago

@pingsutw , can you list the remaining work here?