airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.57k stars 4.01k forks source link

Handle large responses gracefully #34814

Closed lmossman closed 7 months ago

lmossman commented 7 months ago

What

Currently, the connector builder server fails in a very unclear way if it receives too large of a response from the API.

This is what it looks like when it fails this way:

Screenshot 2024-02-02 at 3 34 39 PM

From my testing, it looks like this happens when the API response gets to the ~25 MB threshold. Below that, it properly returns the response, but above that, it throws these errors in the deployment logs, which result in the error shown above:

stacktrace ``` airbyte-connector-builder-server | 2024-02-02 23:34:26 ERROR i.a.c.c.ProcessOutputParser(generateError):93 - The CDK command `test_read` completed properly but no records nor trace were found. Logs were: 0. airbyte-connector-builder-server | 2024-02-02 23:34:26 INFO i.m.h.s.n.h.a.e.AccessLog(log):125 - airbyte-server.oss_airbyte_internal - - [02/Feb/2024:23:34:25 +0000] "POST /v1/stream/read HTTP/1.1" 500 970 airbyte-server | 2024-02-02 23:34:26 ERROR i.a.s.a.ApiHelper(execute):49 - Unexpected Exception airbyte-server | org.openapitools.client.infrastructure.ServerException: Server error : 500 Internal Server Error airbyte-server | at io.airbyte.connectorbuilderserver.api.client.generated.ConnectorBuilderServerApi.readStream(ConnectorBuilderServerApi.kt:80) ~[io.airbyte-airbyte-api-dev.jar:?] airbyte-server | at io.airbyte.commons.server.handlers.ConnectorBuilderProjectsHandler.readConnectorBuilderProjectStream(ConnectorBuilderProjectsHandler.java:340) ~[io.airbyte-airbyte-commons-server-dev.jar:?] airbyte-server | at io.airbyte.server.apis.ConnectorBuilderProjectApiController.lambda$readConnectorBuilderProjectStream$5(ConnectorBuilderProjectApiController.java:110) ~[io.airbyte-airbyte-server-dev.jar:?] airbyte-server | at io.airbyte.server.apis.ApiHelper.execute(ApiHelper.java:28) ~[io.airbyte-airbyte-server-dev.jar:?] airbyte-server | at io.airbyte.server.apis.ConnectorBuilderProjectApiController.readConnectorBuilderProjectStream(ConnectorBuilderProjectApiController.java:110) ~[io.airbyte-airbyte-server-dev.jar:?] airbyte-server | at io.airbyte.server.apis.$ConnectorBuilderProjectApiController$Definition$Exec.dispatch(Unknown Source) ~[io.airbyte-airbyte-server-dev.jar:?] airbyte-server | at io.micronaut.context.AbstractExecutableMethodsDefinition$DispatchedExecutableMethod.invoke(AbstractExecutableMethodsDefinition.java:371) ~[micronaut-inject-3.10.1.jar:3.10.1] airbyte-server | at io.micronaut.context.DefaultBeanContext$4.invoke(DefaultBeanContext.java:594) ~[micronaut-inject-3.10.1.jar:3.10.1] airbyte-server | at io.micronaut.web.router.AbstractRouteMatch.execute(AbstractRouteMatch.java:303) ~[micronaut-router-3.10.1.jar:3.10.1] airbyte-server | at io.micronaut.web.router.RouteMatch.execute(RouteMatch.java:111) ~[micronaut-router-3.10.1.jar:3.10.1] airbyte-server | at io.micronaut.http.context.ServerRequestContext.with(ServerRequestContext.java:103) ~[micronaut-http-3.10.1.jar:3.10.1] airbyte-server | at io.micronaut.http.server.RouteExecutor.lambda$executeRoute$14(RouteExecutor.java:659) ~[micronaut-http-server-3.10.1.jar:3.10.1] airbyte-server | at reactor.core.publisher.FluxDeferContextual.subscribe(FluxDeferContextual.java:49) ~[reactor-core-3.5.5.jar:3.5.5] airbyte-server | at reactor.core.publisher.InternalFluxOperator.subscribe(InternalFluxOperator.java:62) ~[reactor-core-3.5.5.jar:3.5.5] airbyte-server | at reactor.core.publisher.FluxSubscribeOn$SubscribeOnSubscriber.run(FluxSubscribeOn.java:194) ~[reactor-core-3.5.5.jar:3.5.5] airbyte-server | at io.micronaut.reactive.reactor.instrument.ReactorInstrumentation.lambda$init$0(ReactorInstrumentation.java:62) ~[micronaut-runtime-3.10.1.jar:3.10.1] airbyte-server | at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:84) ~[reactor-core-3.5.5.jar:3.5.5] airbyte-server | at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:37) ~[reactor-core-3.5.5.jar:3.5.5] airbyte-server | at io.micronaut.scheduling.instrument.InvocationInstrumenterWrappedCallable.call(InvocationInstrumenterWrappedCallable.java:53) ~[micronaut-context-3.10.1.jar:3.10.1] airbyte-server | at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[?:?] airbyte-server | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?] airbyte-server | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?] airbyte-server | at java.base/java.lang.Thread.run(Thread.java:1583) [?:?] airbyte-server | 2024-02-02 23:34:26 ERROR i.a.s.e.UncaughtExceptionHandler(handle):31 - Uncaught exception airbyte-server | org.openapitools.client.infrastructure.ServerException: Server error : 500 Internal Server Error airbyte-proxy | 172.24.0.1 - - [02/Feb/2024:23:34:26 +0000] "POST /api/v1/connector_builder_projects/read_stream HTTP/1.1" 500 768 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" airbyte-server | at io.airbyte.connectorbuilderserver.api.client.generated.ConnectorBuilderServerApi.readStream(ConnectorBuilderServerApi.kt:80) ~[io.airbyte-airbyte-api-dev.jar:?] airbyte-server | at io.airbyte.commons.server.handlers.ConnectorBuilderProjectsHandler.readConnectorBuilderProjectStream(ConnectorBuilderProjectsHandler.java:340) ~[io.airbyte-airbyte-commons-server-dev.jar:?] airbyte-server | at io.airbyte.server.apis.ConnectorBuilderProjectApiController.lambda$readConnectorBuilderProjectStream$5(ConnectorBuilderProjectApiController.java:110) ~[io.airbyte-airbyte-server-dev.jar:?] airbyte-server | at io.airbyte.server.apis.ApiHelper.execute(ApiHelper.java:28) ~[io.airbyte-airbyte-server-dev.jar:?] airbyte-server | at io.airbyte.server.apis.ConnectorBuilderProjectApiController.readConnectorBuilderProjectStream(ConnectorBuilderProjectApiController.java:110) ~[io.airbyte-airbyte-server-dev.jar:?] airbyte-server | at io.airbyte.server.apis.$ConnectorBuilderProjectApiController$Definition$Exec.dispatch(Unknown Source) ~[io.airbyte-airbyte-server-dev.jar:?] airbyte-server | at io.micronaut.context.AbstractExecutableMethodsDefinition$DispatchedExecutableMethod.invoke(AbstractExecutableMethodsDefinition.java:371) ~[micronaut-inject-3.10.1.jar:3.10.1] airbyte-server | at io.micronaut.context.DefaultBeanContext$4.invoke(DefaultBeanContext.java:594) ~[micronaut-inject-3.10.1.jar:3.10.1] airbyte-server | at io.micronaut.web.router.AbstractRouteMatch.execute(AbstractRouteMatch.java:303) ~[micronaut-router-3.10.1.jar:3.10.1] airbyte-server | at io.micronaut.web.router.RouteMatch.execute(RouteMatch.java:111) ~[micronaut-router-3.10.1.jar:3.10.1] airbyte-server | at io.micronaut.http.context.ServerRequestContext.with(ServerRequestContext.java:103) ~[micronaut-http-3.10.1.jar:3.10.1] airbyte-server | at io.micronaut.http.server.RouteExecutor.lambda$executeRoute$14(RouteExecutor.java:659) ~[micronaut-http-server-3.10.1.jar:3.10.1] airbyte-server | at reactor.core.publisher.FluxDeferContextual.subscribe(FluxDeferContextual.java:49) ~[reactor-core-3.5.5.jar:3.5.5] airbyte-server | at reactor.core.publisher.InternalFluxOperator.subscribe(InternalFluxOperator.java:62) ~[reactor-core-3.5.5.jar:3.5.5] airbyte-server | at reactor.core.publisher.FluxSubscribeOn$SubscribeOnSubscriber.run(FluxSubscribeOn.java:194) ~[reactor-core-3.5.5.jar:3.5.5] airbyte-server | at io.micronaut.reactive.reactor.instrument.ReactorInstrumentation.lambda$init$0(ReactorInstrumentation.java:62) ~[micronaut-runtime-3.10.1.jar:3.10.1] airbyte-server | at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:84) ~[reactor-core-3.5.5.jar:3.5.5] airbyte-server | at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:37) ~[reactor-core-3.5.5.jar:3.5.5] airbyte-server | at io.micronaut.scheduling.instrument.InvocationInstrumenterWrappedCallable.call(InvocationInstrumenterWrappedCallable.java:53) ~[micronaut-context-3.10.1.jar:3.10.1] airbyte-server | at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[?:?] airbyte-server | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?] airbyte-server | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?] airbyte-server | at java.base/java.lang.Thread.run(Thread.java:1583) [?:?] airbyte-server | 2024-02-02 23:34:26 INFO i.m.h.s.n.h.a.e.AccessLog(log):125 - airbyte-proxy.oss_airbyte_internal - - [02/Feb/2024:23:34:24 +0000] "POST /api/v1/connector_builder_projects/read_stream HTTP/1.0" 500 768 ```

To reproduce this behavior, you can use my mock server implementation which I have pushed here: https://github.com/lmossman/mockserver (see the README on running instructions)

I found that when running Airbyte on docker-compose, the above error happened whenever the response surpassed 25 MB (e.g. the data_8300_records.json and data_300_friends.json - the former contains 8300 small records, while the latter contains 100 records each with 300 "friends" (a sub-array of records), so each record is wider.

In both cases, the error happened when a total data size of about 25 MB was used. My guess is that the CDK process is running out of memory and terminating early, so it never sends the trace message back up to the server.

Acceptance Criteria

We should handle this type of issue better, doing one of the following:

lmossman commented 7 months ago

This is likely caused by this same underlying issue: https://github.com/airbytehq/airbyte/issues/29228

Going to close this out in favor of that issue