airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.18k stars 4.14k forks source link

Using GCS Results in could not find attempt stats for job_id #48503

Open helcim-adam-c opened 5 hours ago

helcim-adam-c commented 5 hours ago

Helm Chart Version

1.1.0

What step the error happened?

None

Relevant information

Hi, we currently having issues implementing a datadog integration with out gke hosted airbyte deployment. We use argocd which runs a helm template on the 1.1.0 helm chart to deploy all the resources.

Here's our current override.yaml file

# Global params that are overwritten with umbrella chart
global:
  auth:
    enabled: true
  # -- The URL where Airbyte will be reached; This should match your Ingress host
  airbyteUrl: "https://airbyte.org.com"

  storage:
    type: "GCS"
    storageSecretName: airbyte-gcs-log-creds
    bucket:
      log: data-airbyte-logs
      state: data-airbyte-logs
      workloadOutput: data-airbyte-logs
    gcs:
      projectId: <project-id>
      credentialsPath: /secrets/gcs-log-creds/gcp.json

  database:
    type: "external" # "external"

    # -- Secret name where database credentials are stored
    secretName: "prod-airbyte-psql-password" # e.g. "airbyte-config-secrets"

    # -- The database host
    host: "10.233.128.5"

    # -- The database port
    port: "5432"

    # -- The database name
    database: "db-airbyte"

    # -- The database user
    user: "DATABASE_USER"
    # -- The key within `secretName` where the user is stored
    userSecretKey: DATABASE_USER

    # -- The database password
    password: "DATABASE_PASSWORD"
    # -- The key within `secretName` where the password is stored
    #passwordSecretKey: "" # e.g."database-password"

  env_vars:
    HTTP_IDLE_TIMEOUT: 25m
    READ_TIMEOUT: 30m

  metrics:
    metricClient: datadog

server:
  enabled: true

  env_vars:
    METRIC_CLIENT: datadog
    DD_AGENT_HOST: datadog-agent.datadog.svc.cluster.local
    DD_DOGSTATSD_PORT: 8125

worker:
  enabled: true

  env_vars:
    DD_AGENT_HOST: datadog-agent.datadog.svc.cluster.local
    DD_DOGSTATSD_PORT: 8125

workload-launcher:
  enabled: true

  env_vars:
    DD_AGENT_HOST: datadog-agent.datadog.svc.cluster.local
    DD_DOGSTATSD_PORT: 8125

workload-api-server:
  enabled: true

  env_vars:
    DD_AGENT_HOST: datadog-agent.datadog.svc.cluster.local
    DD_DOGSTATSD_PORT: 8125

metrics:
  enabled: true

  env_vars:
    PUBLISH_METRICS: true
    METRIC_CLIENT: datadog
    DD_AGENT_HOST: datadog-agent.datadog.svc.cluster.local
    DD_DOGSTATSD_PORT: 8125

postgresql:
  # -- Switch to enable or disable the PostgreSQL helm chart
  enabled: false
  image:
    repository: airbyte/db
  # -- Airbyte Postgresql username
  postgresqlUsername: airbyte
  # -- Airbyte Postgresql password
  postgresqlPassword: airbyte
  # -- Airbyte Postgresql database
  postgresqlDatabase: db-airbyte

externalDatabase:
  # -- Database host
  host: "10.233.128.5"
  # -- non-root Username for Airbyte Database
  user: "airbyte"
  # -- Database password
  password: ""
  # -- Name of an existing secret resource containing the DB password
  existingSecret: "prod-airbyte-psql-password"
  # -- Name of an existing secret key containing the DB password
  existingSecretPasswordKey: "DATABASE_PASSWORD"
  # -- Database name
  database: "db-airbyte"
  # -- Database port number
  port: "5432"
  # -- Database full JDBL URL (ex: jdbc:postgresql://host:port/db?parameters)
  jdbcUrl: ""

airbyte-bootloader:
  extraEnv:
    - name: DATABASE_USER
      value: airbyte

I'm provided the log output from the server we're seeing, looks like it's not able to get the job stats from our gcs db, but it's not running into an authentication or connection issue from what I can tell.

Relevant log output

io.airbyte.commons.server.errors.IdNotFoundKnownException: Could not find attempt stats for job_id: 2199 and attempt no: 0
    at io.airbyte.commons.server.handlers.AttemptHandler.getAttemptCombinedStats(AttemptHandler.java:248) ~[io.airbyte-airbyte-commons-server-1.1.0.jar:?]
    at io.airbyte.server.apis.AttemptApiController.lambda$getAttemptCombinedStats$2(AttemptApiController.java:69) ~[io.airbyte-airbyte-server-1.1.0.jar:?]
    at io.airbyte.server.apis.ApiHelper.execute(ApiHelper.java:28) ~[io.airbyte-airbyte-server-1.1.0.jar:?]
    at io.airbyte.server.apis.AttemptApiController.getAttemptCombinedStats(AttemptApiController.java:69) ~[io.airbyte-airbyte-server-1.1.0.jar:?]
    at io.airbyte.server.apis.$AttemptApiController$Definition$Exec.dispatch(Unknown Source) ~[io.airbyte-airbyte-server-1.1.0.jar:?]
    at io.micronaut.context.AbstractExecutableMethodsDefinition$DispatchedExecutableMethod.invokeUnsafe(AbstractExecutableMethodsDefinition.java:461) ~[micronaut-inject-4.6.5.jar:4.6.5]
    at io.micronaut.context.DefaultBeanContext$BeanContextUnsafeExecutionHandle.invokeUnsafe(DefaultBeanContext.java:4350) ~[micronaut-inject-4.6.5.jar:4.6.5]
    at io.micronaut.web.router.AbstractRouteMatch.execute(AbstractRouteMatch.java:272) ~[micronaut-router-4.6.5.jar:4.6.5]
    at io.micronaut.web.router.DefaultUriRouteMatch.execute(DefaultUriRouteMatch.java:38) ~[micronaut-router-4.6.5.jar:4.6.5]
    at io.micronaut.http.server.RouteExecutor.executeRouteAndConvertBody(RouteExecutor.java:498) ~[micronaut-http-server-4.6.5.jar:4.6.5]
    at io.micronaut.http.server.RouteExecutor.lambda$callRoute$5(RouteExecutor.java:475) ~[micronaut-http-server-4.6.5.jar:4.6.5]
    at io.micronaut.core.execution.ExecutionFlow.lambda$async$1(ExecutionFlow.java:87) ~[micronaut-core-4.6.5.jar:4.6.5]
    at io.micronaut.core.propagation.PropagatedContext.lambda$wrap$3(PropagatedContext.java:211) ~[micronaut-core-4.6.5.jar:4.6.5]
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
    at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
2024-11-14 22:56:36 ERROR i.a.c.s.e.h.IdNotFoundExceptionHandler(handle):33 - Not found exception class NotFoundKnownExceptionInfo {
    id: null
    message: Id not found: Could not find attempt stats for job_id: 2199 and attempt no: 0
    exceptionClassName: io.airbyte.commons.server.errors.IdNotFoundKnownException
    exceptionStack: [io.airbyte.commons.server.errors.IdNotFoundKnownException: Id not found: Could not find attempt stats for job_id: 2199 and attempt no: 0,  at io.airbyte.commons.server.errors.handlers.IdNotFoundExceptionHandler.handle(IdNotFoundExceptionHandler.java:32),     at io.airbyte.commons.server.errors.handlers.IdNotFoundExceptionHandler.handle(IdNotFoundExceptionHandler.java:23),     at io.micronaut.http.server.RequestLifecycle.lambda$handlerExceptionHandler$10(RequestLifecycle.java:308),  at io.micronaut.http.server.RequestLifecycle.handlerExceptionHandler(RequestLifecycle.java:319),    at io.micronaut.http.server.RequestLifecycle.onErrorNoFilter(RequestLifecycle.java:248),    at io.micronaut.http.server.RequestLifecycle.lambda$onErrorNoFilter$2(RequestLifecycle.java:210),   at io.micronaut.core.execution.ImperativeExecutionFlowImpl.onErrorResume(ImperativeExecutionFlowImpl.java:112),     at io.micronaut.core.execution.DelayedExecutionFlowImpl$OnErrorResume.apply(DelayedExecutionFlowImpl.java:313),     at io.micronaut.core.execution.DelayedExecutionFlowImpl.work(DelayedExecutionFlowImpl.java:51),     at io.micronaut.core.execution.DelayedExecutionFlowImpl.complete0(DelayedExecutionFlowImpl.java:64),    at io.micronaut.core.execution.DelayedExecutionFlowImpl.completeExceptionally(DelayedExecutionFlowImpl.java:75),    at io.micronaut.core.execution.ExecutionFlow.lambda$async$0(ExecutionFlow.java:92),     at io.micronaut.core.execution.ImperativeExecutionFlowImpl.onComplete(ImperativeExecutionFlowImpl.java:132),    at io.micronaut.core.execution.ExecutionFlow.lambda$async$1(ExecutionFlow.java:87),     at io.micronaut.core.propagation.PropagatedContext.lambda$wrap$3(PropagatedContext.java:211),   at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144),   at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642),   at java.base/java.lang.Thread.run(Thread.java:1583), Caused by: io.airbyte.commons.server.errors.IdNotFoundKnownException: Could not find attempt stats for job_id: 2199 and attempt no: 0,     at io.airbyte.commons.server.handlers.AttemptHandler.getAttemptCombinedStats(AttemptHandler.java:248),  at io.airbyte.server.apis.AttemptApiController.lambda$getAttemptCombinedStats$2(AttemptApiController.java:69),  at io.airbyte.server.apis.ApiHelper.execute(ApiHelper.java:28),     at io.airbyte.server.apis.AttemptApiController.getAttemptCombinedStats(AttemptApiController.java:69),   at io.airbyte.server.apis.$AttemptApiController$Definition$Exec.dispatch(Unknown Source),   at io.micronaut.context.AbstractExecutableMethodsDefinition$DispatchedExecutableMethod.invokeUnsafe(AbstractExecutableMethodsDefinition.java:461),  at io.micronaut.context.DefaultBeanContext$BeanContextUnsafeExecutionHandle.invokeUnsafe(DefaultBeanContext.java:4350),     at io.micronaut.web.router.AbstractRouteMatch.execute(AbstractRouteMatch.java:272),     at io.micronaut.web.router.DefaultUriRouteMatch.execute(DefaultUriRouteMatch.java:38),  at io.micronaut.http.server.RouteExecutor.executeRouteAndConvertBody(RouteExecutor.java:498),   at io.micronaut.http.server.RouteExecutor.lambda$callRoute$5(RouteExecutor.java:475),   ... 5 more]
    rootCauseExceptionClassName: java.lang.Class
    rootCauseExceptionStack: [io.airbyte.commons.server.errors.IdNotFoundKnownException: Could not find attempt stats for job_id: 2199 and attempt no: 0,   at io.airbyte.commons.server.handlers.AttemptHandler.getAttemptCombinedStats(AttemptHandler.java:248),  at io.airbyte.server.apis.AttemptApiController.lambda$getAttemptCombinedStats$2(AttemptApiController.java:69),  at io.airbyte.server.apis.ApiHelper.execute(ApiHelper.java:28),     at io.airbyte.server.apis.AttemptApiController.getAttemptCombinedStats(AttemptApiController.java:69),   at io.airbyte.server.apis.$AttemptApiController$Definition$Exec.dispatch(Unknown Source),   at io.micronaut.context.AbstractExecutableMethodsDefinition$DispatchedExecutableMethod.invokeUnsafe(AbstractExecutableMethodsDefinition.java:461),  at io.micronaut.context.DefaultBeanContext$BeanContextUnsafeExecutionHandle.invokeUnsafe(DefaultBeanContext.java:4350),     at io.micronaut.web.router.AbstractRouteMatch.execute(AbstractRouteMatch.java:272),     at io.micronaut.web.router.DefaultUriRouteMatch.execute(DefaultUriRouteMatch.java:38),  at io.micronaut.http.server.RouteExecutor.executeRouteAndConvertBody(RouteExecutor.java:498),   at io.micronaut.http.server.RouteExecutor.lambda$callRoute$5(RouteExecutor.java:475),   at io.micronaut.core.execution.ExecutionFlow.lambda$async$1(ExecutionFlow.java:87),     at io.micronaut.core.propagation.PropagatedContext.lambda$wrap$3(PropagatedContext.java:211),   at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144),   at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642),   at java.base/java.lang.Thread.run(Thread.java:1583)]
}
reidab commented 4 hours ago

@helcim-adam-c There's a chance this is a similar issue to the one I described in #48502. Is CONTAINER_ORCHESTRATOR_SECRET_NAME set to the correct value on your worker and workload launcher pods?