When creating a connector to ingest hundreds of tables using dozens of tasks the current table validation behavior can DOS the kusto cluster with tens of thousands of validation queries. An early example we encountered was with ~400 tables and 50 tasks, which resulted in ~20,000 (slightly more due to retries). Many of these validations would ultimately fail with errors which would prevent tasks from starting at all.
The most common exceptions we encountered due to this issue were:
org.apache.kafka.connect.errors.ConnectException: Unable to connect to ADX(Kusto) instance
at com.microsoft.azure.kusto.kafka.connect.sink.KustoSinkTask.validateTableAccess(KustoSinkTask.java:379)
at com.microsoft.azure.kusto.kafka.connect.sink.KustoSinkTask.validateTableMappings(KustoSinkTask.java:222)
at com.microsoft.azure.kusto.kafka.connect.sink.KustoSinkTask.start(KustoSinkTask.java:428)
at org.apache.kafka.connect.runtime.WorkerSinkTask.initializeAndStart(WorkerSinkTask.java:305)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:196)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:189)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:239)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.microsoft.azure.kusto.data.exceptions.DataClientException: Error in post request:Connect to ghdwkustostaging.eastus2.kusto.windows.net:443 [ghdwkustostaging.eastus2.kusto.windows.net/52.177.235.178] failed: Connection timed out (Connection timed out)
at com.microsoft.azure.kusto.data.Utils.post(Utils.java:69)
at com.microsoft.azure.kusto.data.ClientImpl.executeToJsonResult(ClientImpl.java:118)
at com.microsoft.azure.kusto.data.ClientImpl.execute(ClientImpl.java:77)
at com.microsoft.azure.kusto.data.ClientImpl.execute(ClientImpl.java:72)
at com.microsoft.azure.kusto.kafka.connect.sink.KustoSinkTask.validateTableAccess(KustoSinkTask.java:323)
... 11 more
Caused by: org.apache.http.conn.HttpHostConnectException: Connect to ghdwkustostaging.eastus2.kusto.windows.net:443 [ghdwkustostaging.eastus2.kusto.windows.net/52.177.235.178] failed: Connection timed out (Connection timed out)
at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:156)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376)
at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at com.microsoft.azure.kusto.data.Utils.post(Utils.java:56)
... 15 more
Caused by: java.net.ConnectException: Connection timed out (Connection timed out)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:368)
at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
26 more
This PR seeks to make the table validation logic optional. This solution does change the current default behavior, which validates by default, to an opt-in option. This may be viewed as a breaking change or regression and a disable table validation option may be more desirable to the maintainers. Personally I feel like this validation should be opt-in rather than opt-out, but I'm open to refactor this solution if there is disagreement.
Future Release Comment
Added an option to control whether the connector will validate access to target tables during initialization. This validation can have a negative impact on a target cluster when there are a large number of tables and/or tasks are configured.
Pull Request Description
When creating a connector to ingest hundreds of tables using dozens of tasks the current table validation behavior can DOS the kusto cluster with tens of thousands of validation queries. An early example we encountered was with ~400 tables and 50 tasks, which resulted in ~20,000 (slightly more due to retries). Many of these validations would ultimately fail with errors which would prevent tasks from starting at all.
The most common exceptions we encountered due to this issue were:
This PR seeks to make the table validation logic optional. This solution does change the current default behavior, which validates by default, to an opt-in option. This may be viewed as a breaking change or regression and a
disable table validation
option may be more desirable to the maintainers. Personally I feel like this validation should be opt-in rather than opt-out, but I'm open to refactor this solution if there is disagreement.Future Release Comment
Added an option to control whether the connector will validate access to target tables during initialization. This validation can have a negative impact on a target cluster when there are a large number of tables and/or tasks are configured.
Breaking Changes:
Features:
Fixes: