flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.77k stars 658 forks source link

[BUG] Spark Log Setting Not Applied in Flyte Sandbox Helm Values YAML #4829

Open jasonlai1218 opened 9 months ago

jasonlai1218 commented 9 months ago

Describe the bug

I encountered the problem that the Spark Driver pod log cannot be displayed on the console. General Python Task pod log can work, but the Spark Task cannot.

http://localhost:30082/#!/log/flytesnacks-development/....../pod?namespace=flytesnacks-development Screenshot 2024-02-02 at 12 36 47 PM

Expected behavior

I manually adjusted the URL and found the correct result Screenshot 2024-02-05 at 3 58 47 PM

Additional context to reproduce

My Helm values yaml file

docker-registry:
  enabled: false
  image:
    registry: harbor.linecorp.com/ecacda
    repository: cr.flyte.org/flyteorg/registry
    tag: 2.8.1
    pullPolicy: Always
  persistence:
    enabled: false
  service:
    type: NodePort
    nodePort: 30000

flyte-binary:
  nameOverride: flyte-sandbox
  enabled: true
  configuration:
    database:
      host: '{{ printf "%s-postgresql" .Release.Name | trunc 63 | trimSuffix "-" }}'
      password: postgres
    storage:
      metadataContainer: my-s3-bucket
      userDataContainer: my-s3-bucket
      provider: s3
      providerConfig:
        s3:
          disableSSL: true
          v2Signing: true
          endpoint: http://{{ printf "%s-minio" .Release.Name | trunc 63 | trimSuffix "-" }}.{{ .Release.Namespace }}:9000
          authType: accesskey
          accessKey: minio
          secretKey: miniostorage
    logging:
      level: 6
      plugins:
        kubernetes:
          enabled: true
          templateUri: |-
            http://10.233.112.73/#/log/{{.namespace }}/{{ .podName }}/pod?namespace={{ .namespace }}
    inline:
      storage:
        signedURL:
          stowConfigOverride:
            endpoint: http://10.227.231.9:30003
      plugins:
        k8s:
          default-env-vars:
            - FLYTE_AWS_ENDPOINT: http://{{ printf "%s-minio" .Release.Name | trunc 63 | trimSuffix "-" }}.{{ .Release.Namespace }}:9000
            - FLYTE_AWS_ACCESS_KEY_ID: minio
            - FLYTE_AWS_SECRET_ACCESS_KEY: miniostorage
        spark:
          spark-config-default:
            - spark.driver.cores: "1"
            - spark.hadoop.fs.s3a.aws.credentials.provider: "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider"
            - spark.hadoop.fs.s3a.endpoint: http://{{ printf "%s-minio" .Release.Name | trunc 63 | trimSuffix "-" }}.{{ .Release.Namespace }}:9000
            - spark.hadoop.fs.s3a.access.key: "minio"
            - spark.hadoop.fs.s3a.secret.key: "miniostorage"
            - spark.hadoop.fs.s3a.path.style.access: "true"
            - spark.kubernetes.allocation.batch.size: "50"
            - spark.hadoop.fs.s3a.acl.default: "BucketOwnerFullControl"
            - spark.hadoop.fs.s3n.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem"
            - spark.hadoop.fs.AbstractFileSystem.s3n.impl: "org.apache.hadoop.fs.s3a.S3A"
            - spark.hadoop.fs.s3.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem"
            - spark.hadoop.fs.AbstractFileSystem.s3.impl: "org.apache.hadoop.fs.s3a.S3A"
            - spark.hadoop.fs.s3a.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem"
            - spark.hadoop.fs.AbstractFileSystem.s3a.impl: "org.apache.hadoop.fs.s3a.S3A"
          logs:
            mixed:
              kubernetes-enabled: true
              kubernetes-url: |-
                http://10.233.112.73/#/log/{{ .namespace }}/{{ .podName }}/pod?namespace={{ .namespace }}
        cluster_resources:
          refreshInterval: 5m
          customData:
            - production:
                - projectQuotaCpu:
                    value: "5"
                - projectQuotaMemory:
                    value: "4000Mi"
            - staging:
                - projectQuotaCpu:
                    value: "2"
                - projectQuotaMemory:
                    value: "3000Mi"
            - development:
                - projectQuotaCpu:
                    value: "4"
                - projectQuotaMemory:
                    value: "5000Mi"
          refresh: 5m
    inlineConfigMap: '{{ include "flyte-sandbox.configuration.inlineConfigMap" . }}'
  clusterResourceTemplates:
    inlineConfigMap: '{{ include "flyte-sandbox.clusterResourceTemplates.inlineConfigMap" . }}'
  deployment:
    image:
      repository: harbor.linecorp.com/ecacda/cr.flyte.org/flyteorg/flyte-binary
      tag: native
      pullPolicy: Always
    waitForDB:
      image:
        repository: harbor.linecorp.com/ecacda/cr.flyte.org/flyteorg/bitnami/postgresql
        tag: 15.1.0-debian-11-r20
        pullPolicy: Always
  rbac:
    # This is strictly NOT RECOMMENDED in production clusters, and is only for use
    # within local Flyte sandboxes.
    # When using cluster resource templates to create additional namespaced roles,
    # Flyte is required to have a superset of those permissions. To simplify
    # experimenting with new backend plugins that require additional roles be created
    # with cluster resource templates (e.g. Spark), we add the following:
    extraRules:
      - apiGroups:
        - '*'
        resources:
        - '*'
        verbs:
        - '*'
  enabled_plugins:
    # -- Tasks specific configuration [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#GetConfig)
    tasks:
      # -- Plugins configuration, [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#TaskPluginConfig)
      task-plugins:
        # -- [Enabled Plugins](https://pkg.go.dev/github.com/lyft/flyteplugins/go/tasks/config#Config).
        # Enable sagemaker*, athena if you install the backend plugins
        enabled-plugins:
          - container
          - sidecar
          - k8s-array
          - agent-service
          - spark
        default-for-task-types:
          container: container
          sidecar: sidecar
          container_array: k8s-array
          spark: spark
          # -- Uncomment to enable task type that uses Flyte Agent
          # bigquery_query_job_task: agent-service

kubernetes-dashboard:
  enabled: true
  image:
    repository: kubernetesui/dashboard
    tag: v2.7.0
    pullPolicy: Always
  extraArgs:
    - --enable-insecure-login
    - --enable-skip-login
  protocolHttp: true
  service:
    externalPort: 80
    type: LoadBalancer
  rbac:
    create: true
    clusterRoleMetrics: false
    clusterReadOnlyRole: true

minio:
  enabled: true
  image:
    registry: harbor.linecorp.com/ecacda
    repository: cr.flyte.org/flyteorg/bitnami/minio
    tag: 2023.1.25-debian-11-r0
    pullPolicy: Always
  auth:
    rootUser: minio
    rootPassword: miniostorage
  defaultBuckets: my-s3-bucket
  extraEnvVars:
    - name: MINIO_BROWSER_REDIRECT_URL
      value: http://localhost:30080/minio
  service:
    type: NodePort
    nodePorts:
      api: 30003
  persistence:
    enabled: true
    existingClaim: '{{ include "flyte-sandbox.persistence.minioVolumeName" . }}'
  volumePermissions:
    enabled: true
    image:
      registry: harbor.linecorp.com/ecacda
      repository: cr.flyte.org/flyteorg/bitnami/bitnami-shell
      tag: 11-debian-11-r76
      pullPolicy: Always

postgresql:
  enabled: true
  image:
    registry: harbor.linecorp.com/ecacda
    repository: cr.flyte.org/flyteorg/bitnami/postgresql
    tag: 15.1.0-debian-11-r20
    pullPolicy: Always
  auth:
    postgresPassword: postgres
  shmVolume:
    enabled: false
  primary:
    service:
      type: NodePort
      nodePorts:
        postgresql: 30001
    persistence:
      enabled: true
      existingClaim: '{{ include "flyte-sandbox.persistence.dbVolumeName" . }}'
  volumePermissions:
    enabled: true
    image:
      registry: harbor.linecorp.com/ecacda
      repository: cr.flyte.org/flyteorg/bitnami/bitnami-shell
      tag: 11-debian-11-r76
      pullPolicy: Always

sandbox:
  # dev Routes requests to an instance of Flyte running locally on a developer's
  # development environment. This is only usable if the flyte-binary chart is disabled.
  dev: true
  buildkit:
    enabled: true
    image:
      repository: moby/buildkit
      tag: buildx-stable-1
      pullPolicy: Always
  proxy:
    enabled: true
    image:
      repository: envoyproxy/envoy
      tag: v1.23-latest
      pullPolicy: Always

Screenshots

Screenshot 2024-02-02 at 12 36 23 PM Screenshot 2024-02-02 at 12 36 38 PM (1) Screenshot 2024-02-02 at 12 36 47 PM (1) Screenshot 2024-02-05 at 3 32 12 PM

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

dosubot[bot] commented 9 months ago

🤖

The issue you're experiencing might be due to the fact that the log configuration in your task_logs.yaml file is only set up for general Kubernetes pods, not specifically for Spark Driver pods.

The kubernetes-template-uri you've provided is a general template that works for most Kubernetes pods. However, Spark Driver pods might have a different log structure or location, which is why they might not be showing up.

To resolve this issue, you would need to find out the correct log structure or location for Spark Driver pods and adjust the kubernetes-template-uri accordingly.

For more details, you can refer to the Flyte codebase, specifically the getEventInfoForSpark function in the flyteplugins/go/tasks/plugins/k8s/spark/spark.go file. This function initializes different log plugins based on the configuration and fetches the logs for the Spark Driver pod.

Please note that this is a general suggestion and the exact solution might vary depending on your specific setup and configuration. If you're still having trouble, I would recommend reaching out to the Spark community for more specific guidance.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

MortalHappiness commented 9 months ago

self-assign