airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.07k stars 4.11k forks source link

[helm] S3 authentication with instanceProfile no longer works #37677

Closed mhemken-vts closed 6 months ago

mhemken-vts commented 6 months ago

Helm Chart Version

airbyte-0.64.151

What step the error happened?

Other

Relevant information

When I try to view the logs (Connection > Job History > > View Logs) I get the 500 error below. This configuration was working last week. I have deployed the helm chart on EKS and it is authenticating with the S3 bucket using IRSA. This configuration was tailing the logs correctly on Friday.

❯ helm ls -an airbyte
NAME    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                   APP VERSION
airbyte airbyte         107             2024-04-29 22:21:22.690098615 +0000 UTC deployed        airbyte-0.64.151        0.57.2     

Relevant log output

{
  "url": "https://<url>/workspaces/.../connections/.../job-history",
  "airbyteVersion": "0.57.2",
  "errorType": "HttpError",
  "errorConstructor": "g1",
  "error": {
    "i18nKey": "errors.http.internalServerError",
    "i18nParams": {
      "status": 500
    },
    "name": "HttpError",
    "requestId": "...",
    "request": {
      "url": "/api/v1/attempt/get_for_job",
      "method": "post",
      "headers": {
        "Content-Type": "application/json"
      },
      "data": {
        "jobId": 3,
        "attemptNumber": 0
      }
    },
    "status": 500,
    "response": {
      "message": "Internal Server Error: The AWS Access Key Id you provided does not exist in our records. (Service: S3, Status Code: 403, Request ID: ..., Extended Request ID:...)",
      "exceptionClassName": "software.amazon.awssdk.services.s3.model.S3Exception",
      "exceptionStack": [
        "software.amazon.awssdk.services.s3.model.S3Exception: The AWS Access Key Id you provided does not exist in our records. (Service: S3, Status Code: 403, Request ID: ..., Extended Request ID: ...)",
        "\tat software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:156)",
        "\tat software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleResponse(AwsXmlPredicatedResponseHandler.java:108)",
        "\tat software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:85)",
        "\tat software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:43)",
        "\tat software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler$Crc32ValidationResponseHandler.handle(AwsSyncClientHandler.java:93)",
        "\tat software.amazon.awssdk.core.internal.handler.BaseClientHandler.lambda$successTransformationResponseHandler$7(BaseClientHandler.java:279)",
        "\tat software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:50)",
        "\tat software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:38)",
        "\tat software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)",
        "\tat software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:72)",
        "\tat software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42)",
        "\tat software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78)",
        "\tat software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40)",
        "\tat software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:55)",
        "\tat software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:39)",
        "\tat software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:81)",
        "\tat software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36)",
        "\tat software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)",
        "\tat software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56)",
        "\tat software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36)",
        "\tat software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80)",
        "\tat software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60)",
        "\tat software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42)",
        "\tat software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:50)",
        "\tat software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:32)",
        "\tat software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)",
        "\tat software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)",
        "\tat software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)",
        "\tat software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)",
        "\tat software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:224)",
        "\tat software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)",
        "\tat software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:173)",
        "\tat software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:80)",
        "\tat software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:182)",
        "\tat software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:74)",
        "\tat software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)",
        "\tat software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:53)",
        "\tat software.amazon.awssdk.services.s3.DefaultS3Client.listObjectsV2(DefaultS3Client.java:7327)",
        "\tat software.amazon.awssdk.services.s3.paginators.ListObjectsV2Iterable$ListObjectsV2ResponseFetcher.nextPage(ListObjectsV2Iterable.java:154)",
        "\tat software.amazon.awssdk.services.s3.paginators.ListObjectsV2Iterable$ListObjectsV2ResponseFetcher.nextPage(ListObjectsV2Iterable.java:145)",
        "\tat software.amazon.awssdk.core.pagination.sync.PaginatedResponsesIterator.next(PaginatedResponsesIterator.java:58)",
        "\tat io.airbyte.config.helpers.S3Logs.getAscendingObjectKeys(S3Logs.java:148)",
        "\tat io.airbyte.config.helpers.S3Logs.tailCloudLog(S3Logs.java:92)",
        "\tat io.airbyte.config.helpers.LogClientSingleton.getJobLogFile(LogClientSingleton.java:86)",
        "\tat io.airbyte.commons.server.converters.JobConverter.getLogRead(JobConverter.java:235)",
        "\tat io.airbyte.commons.server.converters.JobConverter.getAttemptInfoRead(JobConverter.java:151)",
        "\tat java.base/java.util.Optional.map(Optional.java:260)",
        "\tat io.airbyte.commons.server.handlers.AttemptHandler.getAttemptForJob(AttemptHandler.java:83)",
        "\tat io.airbyte.server.apis.AttemptApiController.lambda$getAttemptForJob$0(AttemptApiController.java:49)",
        "\tat io.airbyte.server.apis.ApiHelper.execute(ApiHelper.java:28)",
        "\tat io.airbyte.server.apis.AttemptApiController.getAttemptForJob(AttemptApiController.java:49)",
        "\tat io.airbyte.server.apis.$AttemptApiController$Definition$Exec.dispatch(Unknown Source)",
        "\tat io.micronaut.context.AbstractExecutableMethodsDefinition$DispatchedExecutableMethod.invokeUnsafe(AbstractExecutableMethodsDefinition.java:461)",
        "\tat io.micronaut.context.DefaultBeanContext$BeanContextUnsafeExecutionHandle.invokeUnsafe(DefaultBeanContext.java:4276)",
        "\tat io.micronaut.web.router.AbstractRouteMatch.execute(AbstractRouteMatch.java:271)",
        "\tat io.micronaut.http.server.RouteExecutor.executeRouteAndConvertBody(RouteExecutor.java:488)",
        "\tat io.micronaut.http.server.RouteExecutor.lambda$callRoute$6(RouteExecutor.java:465)",
        "\tat io.micronaut.core.execution.ExecutionFlow.lambda$async$1(ExecutionFlow.java:87)",
        "\tat io.micronaut.core.propagation.PropagatedContext.lambda$wrap$3(PropagatedContext.java:211)",
        "\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)",
        "\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)",
        "\tat java.base/java.lang.Thread.run(Thread.java:1583)"
      ]
    }
  },
  "stacktrace": "c$@https://<url>/assets/WorkspaceAccessManagementSection-wlzRRC7f.js:69:5075\ng1@https://<url>/assets/WorkspaceAccessManagementSection-wlzRRC7f.js:69:5664\nWie@https://<url>/assets/WorkspaceAccessManagementSection-wlzRRC7f.js:69:7330\n",
  "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:124.0) Gecko/20100101 Firefox/124.0",
  "featureFlags": {}
}
marcosmarxm commented 6 months ago

@bgroff can you take a look in this issue?

mhemken-vts commented 6 months ago

Update: It looks like the airbyte-server suddenly has the variables AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY. No other component has these set. I have not added these variables in the configuration. Furthermore, I've edited the values.yaml to set those to empty strings. They are still set to the default admin/minio123.

Does that help narrow down the scope of this issue?

marcosmarxm commented 6 months ago

@mhemken-vts are you upgrading from a previous version? What steps you ran to get into the problem?

mhemken-vts commented 6 months ago

I recently upgraded from 0.50.50 to 0.64.151. The purpose of that upgrade was to be able to use IRSA to authenticate with AWS S3. Using minio for logs had been problematic. The upgrade and switch to S3 fixed it. Our users were finally able to see logs in the Airbyte UI (they were not available previously).

After that had been deployed to all environments, I moved on to doing the same thing with the database: replacing the db included in the helm chart with a stable one in RDS. The reason for this is that the helm chart uses helm-hooks to install the database. This means that the database's statefulset exists outside of the helm release's lifecycle. Maintenance was a pain.

The problem in this issue started after I configured the external database.

mhemken-vts commented 6 months ago

...continued

Points worth noting:

mhemken-vts commented 6 months ago

In a surprise move, it works now. To my knowledge, configuration didn't change. There was one colleague who repeatedly refreshed the helm release for an unrelated issue. That may have done it. Though, I don't understand why it was still broken when I did it.