flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.78k stars 659 forks source link

[BUG] Environment variables not propagating into copilot #5012

Closed samuel-sujith closed 8 months ago

samuel-sujith commented 8 months ago

Describe the bug

I m trying to run the raw container example given here https://docs.flyte.org/en/latest/user_guide/customizing_dependencies/raw_containers.html

On trying the latest version of copilot, it gives me the below error {"json":{},"level":"error","msg":"failed to get AWS credentials: NoCredentialProviders: no valid providers in chain. Deprecated.\n\tFor verbose messaging see aws.Config.CredentialsChainVerboseErrors","ts":"2024-03-06T04:41:42Z"}

I understand from this that the copilot initcontainer is trying to get the values from ENV Vars but the started initcontainer doesnt have the env vars for AWS access

I see that in the flyte workflow pod definition, copilot is not getting the env vars.

initContainers:
    - name: flyte-copilot-downloader
      image: cr.flyte.org/flyteorg/flytecopilot:v1.11.0-b1
      command:
        - /bin/flyte-copilot
        - '--storage.limits.maxDownloadMBs=0'
        - '--storage.container=india-flytedata'
        - '--storage.type=s3'
        - >-
          --storage.connection.secret-key=temp
        - '--storage.connection.access-key=temp'
        - '--storage.connection.auth-type=accesskey'
        - '--storage.connection.region=us-east-1'
        - '--storage.connection.endpoint=temp'
      args:
        - download
        - '--from-remote'
        - >-
          s3://india-flytedata/metadata/propeller/samuel-rawcontainer-development-agrv92bxkl9qprh6n28z/n1/data/inputs.pb
        - '--to-output-prefix'
        - >-
          s3://india-flytedata/metadata/propeller/samuel-rawcontainer-development-agrv92bxkl9qprh6n28z/n1/data/0
        - '--to-local-dir'
        - /var/inputs
        - '--format'
        - JSON
        - '--input-interface'
        - CgkKAWESBAoCCAIKCQoBYhIECgIIAg==
      workingDir: /
      resources:
        limits:
          cpu: 500m
          memory: 128Mi
        requests:
          cpu: 500m
          memory: 128Mi
      volumeMounts:
        - name: flyte-inputs
          mountPath: /var/inputs
        - name: kube-api-access-s9cmc
          readOnly: true
          mountPath: /var/run/secrets/kubernetes.io/serviceaccount

But the container gets the env var

containers:
    - name: agrv92bxkl9qprh6n28z-n1-0
      image: ghcr.io/flyteorg/rawcontainers-julia:v2
      command:
        - julia
        - calculate-ellipse-area.jl
        - '3.4'
        - '4.2'
        - /var/outputs
      env:
        - name: FLYTE_INTERNAL_EXECUTION_WORKFLOW
          value: samuel-rawcontainer:development:rawcontainer.my_workflow.wf
        - name: FLYTE_INTERNAL_EXECUTION_ID
          value: agrv92bxkl9qprh6n28z
        - name: FLYTE_INTERNAL_EXECUTION_PROJECT
          value: samuel-rawcontainer
        - name: FLYTE_INTERNAL_EXECUTION_DOMAIN
          value: development
        - name: FLYTE_ATTEMPT_NUMBER
          value: '0'
        - name: FLYTE_INTERNAL_TASK_PROJECT
          value: samuel-rawcontainer
        - name: FLYTE_INTERNAL_TASK_DOMAIN
          value: development
        - name: FLYTE_INTERNAL_TASK_NAME
          value: ellipse-area-metadata-julia
        - name: FLYTE_INTERNAL_TASK_VERSION
          value: 20240305-190734
        - name: FLYTE_INTERNAL_PROJECT
          value: samuel-rawcontainer
        - name: FLYTE_INTERNAL_DOMAIN
          value: development
        - name: FLYTE_INTERNAL_NAME
          value: ellipse-area-metadata-julia
        - name: FLYTE_INTERNAL_VERSION
          value: 20240305-190734
        - name: AWS_SECRET_ACCESS_KEY
          value: temp
        - name: AWS_REGION
          value: us-east-1
        - name: AWS_ENDPOINT
          value: temp
        - name: AWS_S3_US_EAST_1_REGIONAL_ENDPOINT
          value: temp
        - name: FLYTE_AWS_ENDPOINT
          value: temp
        - name: FLYTE_AWS_ACCESS_KEY_ID
          value: temp
        - name: FLYTE_AWS_SECRET_ACCESS_KEY
          value: temp
        - name: AWS_ACCESS_KEY_ID
          value: temp

even in the containers, the copilot side car is not getting the env vars

I have specified the correct setting in the copilot k8s plugin

  copilot.yaml: |
    plugins:
      k8s:
        FLYTE_AWS_ACCESS_KEY_ID: 11
        FLYTE_AWS_ENDPOINT: temp
        FLYTE_AWS_SECRET_ACCESS_KEY: I11
        default-cpus: 100m
        default-env-vars:
        - FLYTE_AWS_ENDPOINT: 11
        - FLYTE_AWS_ACCESS_KEY_ID: 11
        - FLYTE_AWS_SECRET_ACCESS_KEY:11
        - FLYTE_AWS_ENDPOINT: 11
        - AWS_ACCESS_KEY_ID: 11
        - AWS_SECRET_ACCESS_KEY: 11
        - AWS_REGION: us-east-1
        - AWS_ENDPOINT: 11
        - AWS_S3_US_EAST_1_REGIONAL_ENDPOINT: 11
        co-pilot:
          image: cr.flyte.org/flyteorg/flytecopilot:v1.11.0-b1
          name: flyte-copilot-
          start-timeout: 30s
          storage: 200Mi

Please assist

Expected behavior

ENV vars should propagate into copilot containers and it should download the data from S3

Additional context to reproduce

No response

Screenshots

No response

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

dosubot[bot] commented 8 months ago

Hey @samuel-sujith! πŸš€ I'm Dosu, your friendly neighborhood bot, here to help you tackle bugs, answer your questions, and guide you on contributing while we wait for a human maintainer to join us. If there's anything you need, just let me know. I'm here to assist!

It seems like the environment variables are not propagating into the copilot containers in the Flyte workflow because the syntax used in the k8s.yaml file for specifying the environment variables is incorrect. The correct way to define environment variables in the k8s.yaml file under the default-env-vars section is by using a list of objects with name and value keys, not by using a dash (-) followed by the environment variable name and value separated by a colon (:).

Here is the corrected syntax for the default-env-vars section:

plugins:
  k8s:
    default-env-vars:
      - name: FLYTE_AWS_ENDPOINT
        value: "http://minio.flyte:9000"
      - name: FLYTE_AWS_ACCESS_KEY_ID
        value: "minio"
      - name: FLYTE_AWS_SECRET_ACCESS_KEY
        value: "miniostorage"
    default-cpus: 100m
    default-memory: 100Mi

By correcting the syntax to use a list of objects with name and value keys for each environment variable, the environment variables should now properly propagate into the copilot containers, allowing them to download data from S3 as expected.

Regarding your question about how the Flyte workflow handles environment variable propagation for its containers, specifically for the copilot containers, Flyte workflow does not directly handle environment variable propagation for its containers, including the copilot containers, through the Dockerfile. Environment variables in Docker are typically set using the ENV instruction within the Dockerfile. These environment variables are set at build time and are static for the image created.

For dynamic environment variable propagation at runtime, especially for Kubernetes-based workflows like Flyte, environment variables are usually passed through the workflow definitions or Kubernetes pod specifications. This means that the propagation of environment variables to Flyte copilot containers would be managed outside of the Dockerfile, likely through Flyte's workflow configuration or the Kubernetes job spec that Flyte generates to run each task.

In summary, the Dockerfile for Flyte copilot sets up the environment for the container at build time. Dynamic environment variable propagation at runtime needs to be handled through Flyte's workflow configurations or the Kubernetes specifications that Flyte generates.

I hope this helps! If you have any further questions, feel free to ask.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a πŸ‘ if this is helpful and πŸ‘Ž if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
dosubot[bot] commented 8 months ago

Hey there, @samuel-sujith! πŸ‘‹ I'm Dosu, a friendly bot here to help you tackle bugs, answer questions, and guide you on your way to becoming a contributor. While we wait for a human maintainer to chime in, I'm here to assist you. Let me take a moment to review your issue regarding environment variables not propagating into copilot, and I'll get back to you with a full answer shortly!

samuel-sujith commented 8 months ago

I changed the envvar declaration as mentioned above but the error is still the same

{"json":{},"level":"error","msg":"failed to get AWS credentials: NoCredentialProviders: no valid providers in chain. Deprecated.\n\tFor verbose messaging see aws.Config.CredentialsChainVerboseErrors","ts":"2024-03-06T05:45:03Z"}

Also the copilot container still seems to not get the env variables.

dosubot[bot] commented 8 months ago

Based on the error message you're seeing, it seems like the AWS SDK within the copilot containers is unable to find the AWS credentials. This is typically due to the environment variables not being correctly passed to the containers. Here are some steps you can take to troubleshoot and potentially resolve this issue:

  1. Ensure Environment Variables are Correctly Defined in the k8s.yaml: Double-check the Kubernetes YAML configuration (k8s.yaml) to ensure that the AWS credentials (typically AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and optionally AWS_SESSION_TOKEN) are correctly defined as environment variables for the relevant containers.

  2. Modify the Copilot Container to Explicitly Load Environment Variables: If the copilot containers are not recognizing the AWS credentials from environment variables, you might need to modify the copilot code to explicitly load these environment variables and pass them to the AWS SDK configuration. This can be done by modifying the session.NewSession call in the checkAWSCreds function to include explicit AWS credentials obtained from environment variables.

Here's an example of how you can modify the checkAWSCreds function:

func checkAWSCreds() (*credentials.Value, error) {
    awsAccessKeyID := os.Getenv("AWS_ACCESS_KEY_ID")
    awsSecretAccessKey := os.Getenv("AWS_SECRET_ACCESS_KEY")
    awsSessionToken := os.Getenv("AWS_SESSION_TOKEN") // This might be optional

    sess, err := session.NewSession(&aws.Config{
        Credentials: credentials.NewStaticCredentials(awsAccessKeyID, awsSecretAccessKey, awsSessionToken),
    })
    if err != nil {
        return nil, err
    }

    creds, err := sess.Config.Credentials.Get()
    if err != nil {
        return nil, err
    }
    if creds.AccessKeyID == "" || creds.SecretAccessKey == "" {
        return nil, fmt.Errorf("invalid data in credential fetch")
    }
    return &creds, nil
}
  1. Review the Flyte Workflow Configuration: Ensure that the workflow and task definitions in Flyte are correctly configured to pass environment variables to the copilot containers. This might involve reviewing the Flyte task's container definition to ensure that environment variables are specified.

  2. Debugging and Logs: Utilize logging within the copilot application to debug and confirm that the AWS credentials are being loaded as expected. You can add additional logging around the credential loading logic to ensure that the environment variables are being read correctly.

If you've followed these steps and the issue persists, it would be helpful to have more information about your Flyte workflow configuration and the exact steps you're taking to run the raw container example. This will help in identifying any potential issues with the way the environment variables are being passed to the copilot containers.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a πŸ‘ if this is helpful and πŸ‘Ž if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
samuel-sujith commented 8 months ago

Will this work only with AWS endpoints or can i specify my own MINIO endpoints here. I tried adding the env variables into the dockerfile and run but now I get the below error

{"json":{},"level":"error","msg":"failed to get AWS credentials: failed to load environment config, failed to load, AWS_S3_US_EAST_1_REGIONAL_ENDPOINT from env config, unable to resolve the value of S3UsEast1RegionalEndpoint for https://es-si-s3-z4.eecloud.nsn-net.net","ts":"2024-03-06T06:20:22Z"}

samuel-sujith commented 8 months ago

I set the AWS_S3_US_EAST_1_REGIONAL_ENDPOINT variable to regional and it works for me now. Thanks for your time.