acryldata / datahub-helm

Repository of helm charts for deploying DataHub on a Kubernetes cluster
Apache License 2.0
163 stars 241 forks source link

Datahub v0.10.0 does not create datahub user as admin #273

Closed acherla closed 1 year ago

acherla commented 1 year ago

Describe the bug When running the datahub helm chart v0.10.0, the default account datahub:datahub is not created as an admin user. When logging in as datahub user.

On logging in the following is shown from datahub-frontend, clicking on Settings -> Only shows the following options

image

On logging in the following is shown from datahub-frontend, clicking on My Profile -> Shows unauthorized image

All Components are successfully deployed, currently datahub-frontend, datahub-gms and datahub-actions + pre-installers/setup image

Datahub-frontend logs

2023-03-01 13:01:22,853 [application-akka.actor.default-dispatcher-14] ERROR controllers.TrackingController - Failed to emit product analytics event. actor: urn:li:corpuser:datahub, event: {"title":"DataHub","url":"https://edgeai-datahub.nss.vzwnet.com/user/urn:li:corpuser:datahub","path":"/user/urn:li:corpuser:datahub","hash":"","search":"","width":1280,"height":587,"prevPathname":"/","type":"PageViewEvent","actorUrn":"urn:li:corpuser:datahub","timestamp":1677675647877,"date":"Wed Mar 01 2023 08:00:47 GMT-0500 (Eastern Standard Time)","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36","browserId":"f727face-172b-4add-b2b9-e5b618f023d7"}
2023-03-01 13:56:34,337 [application-akka.actor.default-dispatcher-4] ERROR o.p.o.p.creator.OidcProfileCreator - Bad User Info response, error=null
2023-03-02 05:26:05,111 [application-akka.actor.default-dispatcher-13] ERROR o.p.o.p.creator.OidcProfileCreator - Bad User Info response, error=null

Datahub-gms logs

2023-03-02 07:01:59,544 [I/O dispatcher 1] INFO  c.l.m.s.e.update.BulkListener:47 - Successfully fed bulk request. Number of events: 1 Took time ms: -1

2023-03-02 07:02:01,940 [ForkJoinPool.commonPool-worker-13] WARN  org.elasticsearch.client.RestClient:65 - request [HEAD https://edgeai-elastic.nss.vzwnet.com:443/datahub_usage_event?ignore_throttled=false&ignore_unavailable=false&expand_wildcards=open%2Cclosed&allow_no_indices=false] returned 1 warnings: [299 Elasticsearch-7.17.6-f65e9d338dc1d07b642e14a27f338990148ee5b6 "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]

2023-03-02 07:02:04,081 [ForkJoinPool.commonPool-worker-23] WARN  org.elasticsearch.client.RestClient:65 - request [HEAD https://edgeai-elastic.nss.vzwnet.com:443/datahub_usage_event?ignore_throttled=false&ignore_unavailable=false&expand_wildcards=open%2Cclosed&allow_no_indices=false] returned 1 warnings: [299 Elasticsearch-7.17.6-f65e9d338dc1d07b642e14a27f338990148ee5b6 "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]

2023-03-02 07:02:04,156 [ForkJoinPool.commonPool-worker-23] WARN  org.elasticsearch.client.RestClient:65 - request [POST https://edgeai-elastic.nss.vzwnet.com:443/datahub_usage_event/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true] returned 1 warnings: [299 Elasticsearch-7.17.6-f65e9d338dc1d07b642e14a27f338990148ee5b6 "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]

2023-03-02 07:03:40,335 [pool-11-thread-1] WARN  org.elasticsearch.client.RestClient:65 - request [POST https://edgeai-elastic.nss.vzwnet.com:443/datahubpolicyindex_v2/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true] returned 1 warnings: [299 Elasticsearch-7.17.6-f65e9d338dc1d07b642e14a27f338990148ee5b6 "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]

To Reproduce Steps to reproduce the behavior:

  1. Install datahub helm chart for 0.10.0
  2. login as datahub:datahub
  3. Go to settings (see above image)
  4. Try to access profile and get unauthorized

Expected behavior The expected behavior is when going to settings > it should show administrative options in the UI for managing/creating users and additional items.

Screenshots Added Screenshots above

Desktop (please complete the following information): Bowser Google Chrome

Additional context Installed with OIDC, however even without OIDC configured the same issue still occurs.

szalai1 commented 1 year ago

Thanks for reporting, we are looking into this

acherla commented 1 year ago

Thanks @szalai1 attached is also the helm values config I am using as well

datahub:
  datahub-gms:
    enabled: true
    image:
      repository: linkedin/datahub-gms
      # tag: "v0.10.0 # defaults to .global.datahub.version
    resources:
      limits:
        cpu: "4"
        memory: 16Gi
      requests:
        cpu: "4"
        memory: 16Gi
    service:
      type: ClusterIP # ClusterIP or NodePort
    # Optionally set a GMS specific SQL login (defaults to global login)
    # sql:
    #   datasource:
    #     username: "gms-login"
    #     password:
    #       secretRef: gms-secret
    #       secretKey: gms-password

  datahub-frontend:
    enabled: true
    extraVolumeMounts:
      - mountPath: /etc/datahub/plugins/frontend/auth
        name: user-properties
    extraVolumes:
      - name: user-properties
        configMap:
          name: user-properties
    extraEnvs:
      - name: AUTH_OIDC_ENABLED
        value: "true"
      - name: AUTH_OIDC_CLIENT_ID
        value: oidc-auth-client
      - name: AUTH_OIDC_CLIENT_SECRET
        value: *******
      - name: AUTH_OIDC_BASE_URL
        value: https://edgeai-datahub.nss.****.com
      - name: AUTH_OIDC_DISCOVERY_URI
        value: https://edgeai-rocklin.nss.****.com/dex/.well-known/openid-configuration
      - name: AUTH_OIDC_SCOPE
        value: openid profile email groups
    image:
      repository: linkedin/datahub-frontend-react
      # tag: "v0.10.0" # # defaults to .global.datahub.version
    resources:
      limits:
        memory: 16Gi
        cpu: 4
      requests:
        cpu: 4
        memory: 16Gi
    # Set up ingress to expose react front-end
    ingress:
      enabled: false
    service:
      type: ClusterIP # ClusterIP or NodePort
      port: 9002
      targetPort: http
      protocol: TCP
      name: http

  # Annotations to add to the service, this will help in adding
  # Internal load balancer or various other annotation support in AWS
      annotations: {}
    # service.beta.kubernetes.io/aws-load-balancer-internal: "true"

  acryl-datahub-actions:
    enabled: true
    image:
      repository: acryldata/datahub-actions
      tag: "v0.0.11"
    resources:
      limits:
        memory: 2Gi
        cpu: "1"
      requests:
        cpu: "1"
        memory: 2Gi

  datahub-mae-consumer:
    image:
      repository: linkedin/datahub-mae-consumer
      # tag: "v0.10.0" # defaults to .global.datahub.version
    resources:
      limits:
        memory: 2Gi
        cpu: "1"
      requests:
        cpu: "1"
        memory: 2Gi

  datahub-mce-consumer:
    image:
      repository: linkedin/datahub-mce-consumer
      # tag: "v0.10.0" # defaults to .global.datahub.version
    resources:
      limits:
        memory: 2Gi
        cpu: "1"
      requests:
        cpu: "1"
        memory: 2Gi

  datahub-ingestion-cron:
    enabled: false
    image:
      repository: acryldata/datahub-ingestion
      # tag: "v0.10.0" # defaults to .global.datahub.version

  elasticsearchSetupJob:
    enabled: true
    image:
      repository: linkedin/datahub-elasticsearch-setup
      # tag: "v0.10.0" # defaults to .global.datahub.version
    resources:
      limits:
        cpu: 500m
        memory: 512Mi
      requests:
        cpu: 300m
        memory: 256Mi
    podSecurityContext:
      fsGroup: 1000
    securityContext:
      runAsUser: 1000
      readOnlyRootFilesystem: false
      allowPrivilegeEscalation: false
    podAnnotations: {}

  kafkaSetupJob:
    enabled: false
    image:
      repository: linkedin/datahub-kafka-setup
      # tag: "v0.10.0" # defaults to .global.datahub.version
    resources:
      limits:
        cpu: 500m
        memory: 1024Mi
      requests:
        cpu: 300m
        memory: 768Mi
    podSecurityContext:
      fsGroup: 1000
    securityContext:
      runAsUser: 1000
      readOnlyRootFilesystem: false
      allowPrivilegeEscalation: false
    podAnnotations: {}

  mysqlSetupJob:
    enabled: false
    image:
      repository: acryldata/datahub-mysql-setup
      # tag: "v0.10.0" # defaults to .global.datahub.version
    resources:
      limits:
        cpu: 500m
        memory: 512Mi
      requests:
        cpu: 300m
        memory: 256Mi
    podSecurityContext:
      fsGroup: 1000
    securityContext:
      runAsUser: 1000
      readOnlyRootFilesystem: true
      allowPrivilegeEscalation: false
    podAnnotations: {}
    # Optionally set a set-up job specific login (defaults to global login)
    # username: "mysqlSetupJob-login"
    # password:
    #   secretRef: mysqlSetupJob-secret
    #   secretKey: mysqlSetupJob-password

  postgresqlSetupJob:
    enabled: true
    image:
      repository: acryldata/datahub-postgres-setup
      # tag: "v0.10.0.2" # defaults to .global.datahub.version
    resources:
      limits:
        cpu: 500m
        memory: 512Mi
      requests:
        cpu: 300m
        memory: 256Mi
    podSecurityContext:
      fsGroup: 1000
    securityContext:
      runAsUser: 1000
    #  readOnlyRootFilesystem: true
      allowPrivilegeEscalation: false
    podAnnotations: {}
    # Optionally set a set-up job specific login (defaults to global login)
    # username: "postgresqlSetupJob-login"
    # password:
    #   secretRef: postgresqlSetupJob-secret
    #   secretKey: postgresqlSetupJob-password

  ## No code data migration
  datahubUpgrade:
    enabled: true
    image:
      repository: acryldata/datahub-upgrade
      # tag: "v0.10.0"  # defaults to .global.datahub.version
    batchSize: 1000
    batchDelayMs: 100
    noCodeDataMigration:
      #sqlDbType: "MYSQL"
      sqlDbType: "POSTGRES"
    podSecurityContext:
      fsGroup: 1000
    securityContext:
      runAsUser: 1000
      readOnlyRootFilesystem: false
      allowPrivilegeEscalation: false
    podAnnotations: {}
    restoreIndices:
      resources:
        limits:
          cpu: 500m
          memory: 512Mi
        requests:
          cpu: 300m
          memory: 256Mi

  ## Runs system update processes
  ## Includes: Elasticsearch Indices Creation/Reindex (See global.elasticsearch.index for additional configuration)
  datahubSystemUpdate:
    image:
      repository: acryldata/datahub-upgrade
      # tag:
    podSecurityContext:
      fsGroup: 1000
    securityContext:
      runAsUser: 1000
      readOnlyRootFilesystem: false
      allowPrivilegeEscalation: false
    podAnnotations: {}
    resources:
      limits:
        cpu: 500m
        memory: 512Mi
      requests:
        cpu: 300m
        memory: 256Mi

  global:
    graph_service_impl: elasticsearch
    datahub_analytics_enabled: true
    datahub_standalone_consumers_enabled: false

    elasticsearch:
      host: "edgeai-elastic.nss.******.com"
      port: "443"
      skipcheck: "true"
      insecure: "false"
      useSSL: "true"
      auth:
        username: SVC-EdgeAI-Datahub
        password:
          secretRef: "datahub-elastic-secret"
          secretKey: "password"

      ## The following section controls when and how reindexing of elasticsearch indices are performed
      index:
        ## Enable reindexing when mappings change based on the data model annotations
        enableMappingsReindex: true

        ## Enable reindexing when static index settings change.
        ## Dynamic settings which do not require reindexing are not affected
        ## Primarily this should be enabled when re-sharding is necessary for scaling/performance.
        enableSettingsReindex: true

        ## Index settings can be overridden for entity indices or other indices on an index by index basis
        ## Some index settings, such as # of shards, requires reindexing while others, i.e. replicas, do not
        ## Non-Entity indices do not require the prefix
        # settingsOverrides: '{"graph_service_v1":{"number_of_shards":"5"},"system_metadata_service_v1":{"number_of_shards":"5"}}'
        ## Entity indices do not require the prefix or suffix
        # entitySettingsOverrides: '{"dataset":{"number_of_shards":"10"}}'

        ## The amount of delay between indexing a document and having it returned in queries
        ## Increasing this value can improve performance when ingesting large amounts of data
        # refreshIntervalSeconds: 1

        ## The following options control settings for datahub-upgrade job when creating or reindexing indices
        upgrade:
          ## When reindexing is required, this option will clone the existing index as a backup
          ## The clone indices are not currently managed.
          cloneIndices: true

          ## Typically when reindexing the document counts between the original and destination indices should match.
          ## In some cases reindexing might not be able to proceed due to incompatibilities between a document in the
          ## orignal index and the new index's mappings. This document could be dropped and re-ingested or restored from
          ## the SQL database.
          ##
          ## This setting allows continuing if and only if the cloneIndices setting is also enabled which
          ## ensures a complete backup of the original index is preserved.
          allowDocCountMismatch: false

    kafka:
      bootstrap:
        server: "edgeai-kafka-0.kafka.svc.cluster.local:9094,edgeai-kafka-1.kafka.svc.cluster.local:9094,edgeai-kafka-2.kafka.svc.cluster.local:9094,edgeai-kafka-3.kafka.svc.cluster.local:9094,edgeai-kafka-4.kafka.svc.cluster.local:9094"
      zookeeper:
        server: "edgeai-zookeeper-0.kafka.svc.cluster.local:2181,edgeai-zookeeper-1.kafka.svc.cluster.local:2181,edgeai-zookeeper-2.kafka.svc.cluster.local:2181"
      # This section defines the names for the kafka topics that DataHub depends on, at a global level. Do not override this config
      # at a sub-chart level.
      topics:
        metadata_change_event_name: "metadatachangeeventv4"
        failed_metadata_change_event_name: "failedmetadatachangeeventv4"
        metadata_audit_event_name: "metadataauditeventv4"
        datahub_usage_event_name: "datahubusageeventv1"
        metadata_change_proposal_topic_name: "metadatachangeproposalv1"
        failed_metadata_change_proposal_topic_name: "failedmetadatachangeproposalv1"
        metadata_change_log_versioned_topic_name: "metadatachangelogversionedv1"
        metadata_change_log_timeseries_topic_name: "metadatachangelogtimeseriesv1"
        platform_event_topic_name: "platformeventv1"
        datahub_upgrade_history_topic_name: "datahubupgradehistoryv1"
      ## For AWS MSK set this to a number larger than 1
      # partitions: 3
      # replicationFactor: 3
      schemaregistry:
        url: "http://******.nss.*****.com:8081"
        type: KAFKA
        # glue:
        #   region: us-east-1
        #   registry: datahub

    neo4j:
      host: "prerequisites-neo4j-community:7474"
      uri: "bolt://prerequisites-neo4j-community"
      username: "neo4j"
      password:
        secretRef: neo4j-secrets
        secretKey: neo4j-password
      # --------------OR----------------
      # value: password

    sql:
      datasource:
        host: "*****.nss.*****.com:5432"
        hostForpostgresqlClient: "******.nss.******.com"
        port: "5432"
        url: "jdbc:postgresql://******.nss.****.com:5432/datahub"
        driver: "org.postgresql.Driver"
        username: "datahub"
        password:
          secretRef: "datahub-postgres-secret"
          secretKey: "password"
        # --------------OR----------------
        # value: password

        ## Use below for usage of PostgreSQL instead of MySQL
        # host: "prerequisites-postgresql:5432"
        # hostForpostgresqlClient: "prerequisites-postgresql"
        # port: "5432"
        # url: "jdbc:postgresql://prerequisites-postgresql:5432/datahub"
        # driver: "org.postgresql.Driver"
        # username: "postgres"
        # password:
        #   secretRef: postgresql-secrets
        #   secretKey: postgres-password
        # --------------OR----------------
        #   value: password

    datahub:
      version: v0.10.0
      gms:
        port: "8080"
        nodePort: "30001"

      monitoring:
        enablePrometheus: true

      mae_consumer:
        port: "9091"
        nodePort: "30002"

      appVersion: "1.0"
      systemUpdate:
        ## The following options control settings for datahub-upgrade job which will
        ## managed ES indices and other update related work
        enabled: true

      encryptionKey:
        secretRef: "datahub-encryption-secrets"
        secretKey: "encryption_key_secret"
        # Set to false if you'd like to provide your own secret.
        provisionSecret:
          enabled: true
          autoGenerate: true
        # Only specify if autoGenerate set to false
        #  secretValues:
        #    encryptionKey: <encryption key value>

      managed_ingestion:
        enabled: true
        defaultCliVersion: "0.10.0"

      metadata_service_authentication:
        enabled: false
        systemClientId: "__datahub_system"
        systemClientSecret:
          secretRef: "datahub-auth-secrets"
          secretKey: "token_service_signing_key"
        tokenService:
          signingKey:
            secretRef: "datahub-auth-secrets"
            secretKey: "token_service_signing_key"
          salt:
            secretRef: "datahub-auth-secrets"
            secretKey: "token_service_salt"
        # Set to false if you'd like to provide your own auth secrets
        provisionSecrets:
          enabled: true
          autoGenerate: true
        # Only specify if autoGenerate set to false
        #  secretValues:
        #    secret: <secret value>
        #    signingKey: <signing key value>
        #    salt: <salt value>

  #  hostAliases:
  #    - ip: "192.168.0.104"
  #      hostnames:
  #        - "broker"
  #        - "mysql"
  #        - "postgresql"
  #        - "elasticsearch"
  #        - "neo4j"

  ## Add below to enable SSL for kafka
  #  credentialsAndCertsSecrets:
  #    name: datahub-certs
  #    path: /mnt/datahub/certs
  #    secureEnv:
  #      ssl.key.password: datahub.linkedin.com.KeyPass
  #      ssl.keystore.password: datahub.linkedin.com.KeyStorePass
  #      ssl.truststore.password: datahub.linkedin.com.TrustStorePass
  #      kafkastore.ssl.truststore.password: datahub.linkedin.com.TrustStorePass
  #
  #  springKafkaConfigurationOverrides:
  #    ssl.keystore.location: /mnt/datahub/certs/datahub.linkedin.com.keystore.jks
  #    ssl.truststore.location: /mnt/datahub/certs/datahub.linkedin.com.truststore.jks
  #    kafkastore.ssl.truststore.location: /mnt/datahub/certs/datahub.linkedin.com.truststore.jks
  #    security.protocol: SSL
  #    kafkastore.security.protocol: SSL
  #    ssl.keystore.type: JKS
  #    ssl.truststore.type: JKS
  #    ssl.protocol: TLS
  #    ssl.endpoint.identification.algorithm:
szalai1 commented 1 year ago

Hey, just installed the charts with your values.yaml (removed the oidc env vars). and when I logged with datahub/datahub I was admin:

Screen Shot 2023-03-03 at 18 03 32
szalai1 commented 1 year ago

Installed with OIDC, however even without OIDC configured the same issue still occurs.

This must be an OIDC issue, as without it it worked fine for me.

acherla commented 1 year ago

@szalai1 I disabled OIDC and even ran it against a containerized postgres instance (fresh PVC install) and I am still running into the same issue.

Any chance there is a parameter I need to look at in postgres that might be limiting the privilege's of the datahub user?

arturo-opsetmoen-amador commented 1 year ago

Hi, was this ever solved?

I am running into the same issue (without OIDC) with the helm chart 0.2.182...