The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Currently, the connector builder server fails in a very unclear way if it receives too large of a response from the API.
This is what it looks like when it fails this way:
From my testing, it looks like this happens when the API response gets to the ~25 MB threshold. Below that, it properly returns the response, but above that, it throws these errors in the deployment logs, which result in the error shown above:
stacktrace
```
airbyte-connector-builder-server | 2024-02-02 23:34:26 ERROR i.a.c.c.ProcessOutputParser(generateError):93 - The CDK command `test_read` completed properly but no records nor trace were found. Logs were: 0.
airbyte-connector-builder-server | 2024-02-02 23:34:26 INFO i.m.h.s.n.h.a.e.AccessLog(log):125 - airbyte-server.oss_airbyte_internal - - [02/Feb/2024:23:34:25 +0000] "POST /v1/stream/read HTTP/1.1" 500 970
airbyte-server | 2024-02-02 23:34:26 ERROR i.a.s.a.ApiHelper(execute):49 - Unexpected Exception
airbyte-server | org.openapitools.client.infrastructure.ServerException: Server error : 500 Internal Server Error
airbyte-server | at io.airbyte.connectorbuilderserver.api.client.generated.ConnectorBuilderServerApi.readStream(ConnectorBuilderServerApi.kt:80) ~[io.airbyte-airbyte-api-dev.jar:?]
airbyte-server | at io.airbyte.commons.server.handlers.ConnectorBuilderProjectsHandler.readConnectorBuilderProjectStream(ConnectorBuilderProjectsHandler.java:340) ~[io.airbyte-airbyte-commons-server-dev.jar:?]
airbyte-server | at io.airbyte.server.apis.ConnectorBuilderProjectApiController.lambda$readConnectorBuilderProjectStream$5(ConnectorBuilderProjectApiController.java:110) ~[io.airbyte-airbyte-server-dev.jar:?]
airbyte-server | at io.airbyte.server.apis.ApiHelper.execute(ApiHelper.java:28) ~[io.airbyte-airbyte-server-dev.jar:?]
airbyte-server | at io.airbyte.server.apis.ConnectorBuilderProjectApiController.readConnectorBuilderProjectStream(ConnectorBuilderProjectApiController.java:110) ~[io.airbyte-airbyte-server-dev.jar:?]
airbyte-server | at io.airbyte.server.apis.$ConnectorBuilderProjectApiController$Definition$Exec.dispatch(Unknown Source) ~[io.airbyte-airbyte-server-dev.jar:?]
airbyte-server | at io.micronaut.context.AbstractExecutableMethodsDefinition$DispatchedExecutableMethod.invoke(AbstractExecutableMethodsDefinition.java:371) ~[micronaut-inject-3.10.1.jar:3.10.1]
airbyte-server | at io.micronaut.context.DefaultBeanContext$4.invoke(DefaultBeanContext.java:594) ~[micronaut-inject-3.10.1.jar:3.10.1]
airbyte-server | at io.micronaut.web.router.AbstractRouteMatch.execute(AbstractRouteMatch.java:303) ~[micronaut-router-3.10.1.jar:3.10.1]
airbyte-server | at io.micronaut.web.router.RouteMatch.execute(RouteMatch.java:111) ~[micronaut-router-3.10.1.jar:3.10.1]
airbyte-server | at io.micronaut.http.context.ServerRequestContext.with(ServerRequestContext.java:103) ~[micronaut-http-3.10.1.jar:3.10.1]
airbyte-server | at io.micronaut.http.server.RouteExecutor.lambda$executeRoute$14(RouteExecutor.java:659) ~[micronaut-http-server-3.10.1.jar:3.10.1]
airbyte-server | at reactor.core.publisher.FluxDeferContextual.subscribe(FluxDeferContextual.java:49) ~[reactor-core-3.5.5.jar:3.5.5]
airbyte-server | at reactor.core.publisher.InternalFluxOperator.subscribe(InternalFluxOperator.java:62) ~[reactor-core-3.5.5.jar:3.5.5]
airbyte-server | at reactor.core.publisher.FluxSubscribeOn$SubscribeOnSubscriber.run(FluxSubscribeOn.java:194) ~[reactor-core-3.5.5.jar:3.5.5]
airbyte-server | at io.micronaut.reactive.reactor.instrument.ReactorInstrumentation.lambda$init$0(ReactorInstrumentation.java:62) ~[micronaut-runtime-3.10.1.jar:3.10.1]
airbyte-server | at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:84) ~[reactor-core-3.5.5.jar:3.5.5]
airbyte-server | at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:37) ~[reactor-core-3.5.5.jar:3.5.5]
airbyte-server | at io.micronaut.scheduling.instrument.InvocationInstrumenterWrappedCallable.call(InvocationInstrumenterWrappedCallable.java:53) ~[micronaut-context-3.10.1.jar:3.10.1]
airbyte-server | at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[?:?]
airbyte-server | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
airbyte-server | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
airbyte-server | at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
airbyte-server | 2024-02-02 23:34:26 ERROR i.a.s.e.UncaughtExceptionHandler(handle):31 - Uncaught exception
airbyte-server | org.openapitools.client.infrastructure.ServerException: Server error : 500 Internal Server Error
airbyte-proxy | 172.24.0.1 - - [02/Feb/2024:23:34:26 +0000] "POST /api/v1/connector_builder_projects/read_stream HTTP/1.1" 500 768 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
airbyte-server | at io.airbyte.connectorbuilderserver.api.client.generated.ConnectorBuilderServerApi.readStream(ConnectorBuilderServerApi.kt:80) ~[io.airbyte-airbyte-api-dev.jar:?]
airbyte-server | at io.airbyte.commons.server.handlers.ConnectorBuilderProjectsHandler.readConnectorBuilderProjectStream(ConnectorBuilderProjectsHandler.java:340) ~[io.airbyte-airbyte-commons-server-dev.jar:?]
airbyte-server | at io.airbyte.server.apis.ConnectorBuilderProjectApiController.lambda$readConnectorBuilderProjectStream$5(ConnectorBuilderProjectApiController.java:110) ~[io.airbyte-airbyte-server-dev.jar:?]
airbyte-server | at io.airbyte.server.apis.ApiHelper.execute(ApiHelper.java:28) ~[io.airbyte-airbyte-server-dev.jar:?]
airbyte-server | at io.airbyte.server.apis.ConnectorBuilderProjectApiController.readConnectorBuilderProjectStream(ConnectorBuilderProjectApiController.java:110) ~[io.airbyte-airbyte-server-dev.jar:?]
airbyte-server | at io.airbyte.server.apis.$ConnectorBuilderProjectApiController$Definition$Exec.dispatch(Unknown Source) ~[io.airbyte-airbyte-server-dev.jar:?]
airbyte-server | at io.micronaut.context.AbstractExecutableMethodsDefinition$DispatchedExecutableMethod.invoke(AbstractExecutableMethodsDefinition.java:371) ~[micronaut-inject-3.10.1.jar:3.10.1]
airbyte-server | at io.micronaut.context.DefaultBeanContext$4.invoke(DefaultBeanContext.java:594) ~[micronaut-inject-3.10.1.jar:3.10.1]
airbyte-server | at io.micronaut.web.router.AbstractRouteMatch.execute(AbstractRouteMatch.java:303) ~[micronaut-router-3.10.1.jar:3.10.1]
airbyte-server | at io.micronaut.web.router.RouteMatch.execute(RouteMatch.java:111) ~[micronaut-router-3.10.1.jar:3.10.1]
airbyte-server | at io.micronaut.http.context.ServerRequestContext.with(ServerRequestContext.java:103) ~[micronaut-http-3.10.1.jar:3.10.1]
airbyte-server | at io.micronaut.http.server.RouteExecutor.lambda$executeRoute$14(RouteExecutor.java:659) ~[micronaut-http-server-3.10.1.jar:3.10.1]
airbyte-server | at reactor.core.publisher.FluxDeferContextual.subscribe(FluxDeferContextual.java:49) ~[reactor-core-3.5.5.jar:3.5.5]
airbyte-server | at reactor.core.publisher.InternalFluxOperator.subscribe(InternalFluxOperator.java:62) ~[reactor-core-3.5.5.jar:3.5.5]
airbyte-server | at reactor.core.publisher.FluxSubscribeOn$SubscribeOnSubscriber.run(FluxSubscribeOn.java:194) ~[reactor-core-3.5.5.jar:3.5.5]
airbyte-server | at io.micronaut.reactive.reactor.instrument.ReactorInstrumentation.lambda$init$0(ReactorInstrumentation.java:62) ~[micronaut-runtime-3.10.1.jar:3.10.1]
airbyte-server | at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:84) ~[reactor-core-3.5.5.jar:3.5.5]
airbyte-server | at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:37) ~[reactor-core-3.5.5.jar:3.5.5]
airbyte-server | at io.micronaut.scheduling.instrument.InvocationInstrumenterWrappedCallable.call(InvocationInstrumenterWrappedCallable.java:53) ~[micronaut-context-3.10.1.jar:3.10.1]
airbyte-server | at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[?:?]
airbyte-server | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
airbyte-server | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
airbyte-server | at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
airbyte-server | 2024-02-02 23:34:26 INFO i.m.h.s.n.h.a.e.AccessLog(log):125 - airbyte-proxy.oss_airbyte_internal - - [02/Feb/2024:23:34:24 +0000] "POST /api/v1/connector_builder_projects/read_stream HTTP/1.0" 500 768
```
To reproduce this behavior, you can use my mock server implementation which I have pushed here: https://github.com/lmossman/mockserver
(see the README on running instructions)
I found that when running Airbyte on docker-compose, the above error happened whenever the response surpassed 25 MB (e.g. the data_8300_records.json and data_300_friends.json - the former contains 8300 small records, while the latter contains 100 records each with 300 "friends" (a sub-array of records), so each record is wider.
In both cases, the error happened when a total data size of about 25 MB was used. My guess is that the CDK process is running out of memory and terminating early, so it never sends the trace message back up to the server.
Acceptance Criteria
We should handle this type of issue better, doing one of the following:
Detect when the response is above the size threshold we can handle, and if so raise a clear error message to the user which recommends splitting up the data into smaller chunks through pagination or partitioning
Adjust the server / CDK process to be able to handle larger requests (e.g. give it more memory or refactor to be more performant)
What
Currently, the connector builder server fails in a very unclear way if it receives too large of a response from the API.
This is what it looks like when it fails this way:
From my testing, it looks like this happens when the API response gets to the ~25 MB threshold. Below that, it properly returns the response, but above that, it throws these errors in the deployment logs, which result in the error shown above:
stacktrace
``` airbyte-connector-builder-server | 2024-02-02 23:34:26 ERROR i.a.c.c.ProcessOutputParser(generateError):93 - The CDK command `test_read` completed properly but no records nor trace were found. Logs were: 0. airbyte-connector-builder-server | 2024-02-02 23:34:26 INFO i.m.h.s.n.h.a.e.AccessLog(log):125 - airbyte-server.oss_airbyte_internal - - [02/Feb/2024:23:34:25 +0000] "POST /v1/stream/read HTTP/1.1" 500 970 airbyte-server | 2024-02-02 23:34:26 ERROR i.a.s.a.ApiHelper(execute):49 - Unexpected Exception airbyte-server | org.openapitools.client.infrastructure.ServerException: Server error : 500 Internal Server Error airbyte-server | at io.airbyte.connectorbuilderserver.api.client.generated.ConnectorBuilderServerApi.readStream(ConnectorBuilderServerApi.kt:80) ~[io.airbyte-airbyte-api-dev.jar:?] airbyte-server | at io.airbyte.commons.server.handlers.ConnectorBuilderProjectsHandler.readConnectorBuilderProjectStream(ConnectorBuilderProjectsHandler.java:340) ~[io.airbyte-airbyte-commons-server-dev.jar:?] airbyte-server | at io.airbyte.server.apis.ConnectorBuilderProjectApiController.lambda$readConnectorBuilderProjectStream$5(ConnectorBuilderProjectApiController.java:110) ~[io.airbyte-airbyte-server-dev.jar:?] airbyte-server | at io.airbyte.server.apis.ApiHelper.execute(ApiHelper.java:28) ~[io.airbyte-airbyte-server-dev.jar:?] airbyte-server | at io.airbyte.server.apis.ConnectorBuilderProjectApiController.readConnectorBuilderProjectStream(ConnectorBuilderProjectApiController.java:110) ~[io.airbyte-airbyte-server-dev.jar:?] airbyte-server | at io.airbyte.server.apis.$ConnectorBuilderProjectApiController$Definition$Exec.dispatch(Unknown Source) ~[io.airbyte-airbyte-server-dev.jar:?] airbyte-server | at io.micronaut.context.AbstractExecutableMethodsDefinition$DispatchedExecutableMethod.invoke(AbstractExecutableMethodsDefinition.java:371) ~[micronaut-inject-3.10.1.jar:3.10.1] airbyte-server | at io.micronaut.context.DefaultBeanContext$4.invoke(DefaultBeanContext.java:594) ~[micronaut-inject-3.10.1.jar:3.10.1] airbyte-server | at io.micronaut.web.router.AbstractRouteMatch.execute(AbstractRouteMatch.java:303) ~[micronaut-router-3.10.1.jar:3.10.1] airbyte-server | at io.micronaut.web.router.RouteMatch.execute(RouteMatch.java:111) ~[micronaut-router-3.10.1.jar:3.10.1] airbyte-server | at io.micronaut.http.context.ServerRequestContext.with(ServerRequestContext.java:103) ~[micronaut-http-3.10.1.jar:3.10.1] airbyte-server | at io.micronaut.http.server.RouteExecutor.lambda$executeRoute$14(RouteExecutor.java:659) ~[micronaut-http-server-3.10.1.jar:3.10.1] airbyte-server | at reactor.core.publisher.FluxDeferContextual.subscribe(FluxDeferContextual.java:49) ~[reactor-core-3.5.5.jar:3.5.5] airbyte-server | at reactor.core.publisher.InternalFluxOperator.subscribe(InternalFluxOperator.java:62) ~[reactor-core-3.5.5.jar:3.5.5] airbyte-server | at reactor.core.publisher.FluxSubscribeOn$SubscribeOnSubscriber.run(FluxSubscribeOn.java:194) ~[reactor-core-3.5.5.jar:3.5.5] airbyte-server | at io.micronaut.reactive.reactor.instrument.ReactorInstrumentation.lambda$init$0(ReactorInstrumentation.java:62) ~[micronaut-runtime-3.10.1.jar:3.10.1] airbyte-server | at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:84) ~[reactor-core-3.5.5.jar:3.5.5] airbyte-server | at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:37) ~[reactor-core-3.5.5.jar:3.5.5] airbyte-server | at io.micronaut.scheduling.instrument.InvocationInstrumenterWrappedCallable.call(InvocationInstrumenterWrappedCallable.java:53) ~[micronaut-context-3.10.1.jar:3.10.1] airbyte-server | at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[?:?] airbyte-server | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?] airbyte-server | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?] airbyte-server | at java.base/java.lang.Thread.run(Thread.java:1583) [?:?] airbyte-server | 2024-02-02 23:34:26 ERROR i.a.s.e.UncaughtExceptionHandler(handle):31 - Uncaught exception airbyte-server | org.openapitools.client.infrastructure.ServerException: Server error : 500 Internal Server Error airbyte-proxy | 172.24.0.1 - - [02/Feb/2024:23:34:26 +0000] "POST /api/v1/connector_builder_projects/read_stream HTTP/1.1" 500 768 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" airbyte-server | at io.airbyte.connectorbuilderserver.api.client.generated.ConnectorBuilderServerApi.readStream(ConnectorBuilderServerApi.kt:80) ~[io.airbyte-airbyte-api-dev.jar:?] airbyte-server | at io.airbyte.commons.server.handlers.ConnectorBuilderProjectsHandler.readConnectorBuilderProjectStream(ConnectorBuilderProjectsHandler.java:340) ~[io.airbyte-airbyte-commons-server-dev.jar:?] airbyte-server | at io.airbyte.server.apis.ConnectorBuilderProjectApiController.lambda$readConnectorBuilderProjectStream$5(ConnectorBuilderProjectApiController.java:110) ~[io.airbyte-airbyte-server-dev.jar:?] airbyte-server | at io.airbyte.server.apis.ApiHelper.execute(ApiHelper.java:28) ~[io.airbyte-airbyte-server-dev.jar:?] airbyte-server | at io.airbyte.server.apis.ConnectorBuilderProjectApiController.readConnectorBuilderProjectStream(ConnectorBuilderProjectApiController.java:110) ~[io.airbyte-airbyte-server-dev.jar:?] airbyte-server | at io.airbyte.server.apis.$ConnectorBuilderProjectApiController$Definition$Exec.dispatch(Unknown Source) ~[io.airbyte-airbyte-server-dev.jar:?] airbyte-server | at io.micronaut.context.AbstractExecutableMethodsDefinition$DispatchedExecutableMethod.invoke(AbstractExecutableMethodsDefinition.java:371) ~[micronaut-inject-3.10.1.jar:3.10.1] airbyte-server | at io.micronaut.context.DefaultBeanContext$4.invoke(DefaultBeanContext.java:594) ~[micronaut-inject-3.10.1.jar:3.10.1] airbyte-server | at io.micronaut.web.router.AbstractRouteMatch.execute(AbstractRouteMatch.java:303) ~[micronaut-router-3.10.1.jar:3.10.1] airbyte-server | at io.micronaut.web.router.RouteMatch.execute(RouteMatch.java:111) ~[micronaut-router-3.10.1.jar:3.10.1] airbyte-server | at io.micronaut.http.context.ServerRequestContext.with(ServerRequestContext.java:103) ~[micronaut-http-3.10.1.jar:3.10.1] airbyte-server | at io.micronaut.http.server.RouteExecutor.lambda$executeRoute$14(RouteExecutor.java:659) ~[micronaut-http-server-3.10.1.jar:3.10.1] airbyte-server | at reactor.core.publisher.FluxDeferContextual.subscribe(FluxDeferContextual.java:49) ~[reactor-core-3.5.5.jar:3.5.5] airbyte-server | at reactor.core.publisher.InternalFluxOperator.subscribe(InternalFluxOperator.java:62) ~[reactor-core-3.5.5.jar:3.5.5] airbyte-server | at reactor.core.publisher.FluxSubscribeOn$SubscribeOnSubscriber.run(FluxSubscribeOn.java:194) ~[reactor-core-3.5.5.jar:3.5.5] airbyte-server | at io.micronaut.reactive.reactor.instrument.ReactorInstrumentation.lambda$init$0(ReactorInstrumentation.java:62) ~[micronaut-runtime-3.10.1.jar:3.10.1] airbyte-server | at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:84) ~[reactor-core-3.5.5.jar:3.5.5] airbyte-server | at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:37) ~[reactor-core-3.5.5.jar:3.5.5] airbyte-server | at io.micronaut.scheduling.instrument.InvocationInstrumenterWrappedCallable.call(InvocationInstrumenterWrappedCallable.java:53) ~[micronaut-context-3.10.1.jar:3.10.1] airbyte-server | at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[?:?] airbyte-server | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?] airbyte-server | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?] airbyte-server | at java.base/java.lang.Thread.run(Thread.java:1583) [?:?] airbyte-server | 2024-02-02 23:34:26 INFO i.m.h.s.n.h.a.e.AccessLog(log):125 - airbyte-proxy.oss_airbyte_internal - - [02/Feb/2024:23:34:24 +0000] "POST /api/v1/connector_builder_projects/read_stream HTTP/1.0" 500 768 ```To reproduce this behavior, you can use my mock server implementation which I have pushed here: https://github.com/lmossman/mockserver (see the README on running instructions)
I found that when running Airbyte on docker-compose, the above error happened whenever the response surpassed 25 MB (e.g. the
data_8300_records.json
anddata_300_friends.json
- the former contains 8300 small records, while the latter contains 100 records each with 300 "friends" (a sub-array of records), so each record is wider.In both cases, the error happened when a total data size of about 25 MB was used. My guess is that the CDK process is running out of memory and terminating early, so it never sends the trace message back up to the server.
Acceptance Criteria
We should handle this type of issue better, doing one of the following: