apache / arrow-adbc

Database connectivity API standard and libraries for Apache Arrow
https://arrow.apache.org/adbc/
Apache License 2.0
385 stars 97 forks source link

csharp: integration tests (ClientTests/DriverTests) can cause concurrency issues creating/updating table. #2280

Open birschick-bq opened 3 weeks ago

birschick-bq commented 3 weeks ago

What happened?

When running the (integration) tests, the ClientTests.CanExecuteUpdate and DriverTests.CanExecuteUpdate can concurrently try to create and update the same test table. This can lead to flakey test failures and in the worst case leave the Databricks server in an inconsistent state (resource leaks).

Stack Trace

Message: 
  System.AggregateException : One or more errors occurred. (Error running query: io.delta.exceptions.ConcurrentAppendException: Files were added to the root of the table by a concurrent update. Please try the operation again.
  Conflicting commit: {"timestamp":1730135866536,"operation":"WRITE","operationParameters":{"mode":Append,"statsOnLoad":false,"partitionBy":[]},"readVersion":34,"isolationLevel":"WriteSerializable","isBlindAppend":true,"operationMetrics":{"numFiles":"1","numOutputRows":"1","numOutputBytes":"6118"},"tags":{"restoresDeletedRows":"false"},"engineInfo":"Databricks-Runtime/13.3.x-scala2.12","txnId":"70495e06-e88b-446b-8dd3-9e216cdbf0c6"}
  Refer to https://docs.microsoft.com/azure/databricks/delta/concurrency-control for more details.)
  ---- Apache.Arrow.Adbc.Drivers.Apache.Hive2.HiveServer2Exception : Error running query: io.delta.exceptions.ConcurrentAppendException: Files were added to the root of the table by a concurrent update. Please try the operation again.
  Conflicting commit: {"timestamp":1730135866536,"operation":"WRITE","operationParameters":{"mode":Append,"statsOnLoad":false,"partitionBy":[]},"readVersion":34,"isolationLevel":"WriteSerializable","isBlindAppend":true,"operationMetrics":{"numFiles":"1","numOutputRows":"1","numOutputBytes":"6118"},"tags":{"restoresDeletedRows":"false"},"engineInfo":"Databricks-Runtime/13.3.x-scala2.12","txnId":"70495e06-e88b-446b-8dd3-9e216cdbf0c6"}
  Refer to https://docs.microsoft.com/azure/databricks/delta/concurrency-control for more details.

Stack Trace: 
  Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
  Task`1.get_Result()
  HiveServer2Statement.ExecuteUpdate() line 39
  AdbcCommand.ExecuteNonQuery() line 149
  ClientTests.CanClientExecuteUpdate(AdbcConnection adbcConnection, TestConfiguration testConfiguration, String[] queries, List`1 expectedResults) line 64
  ClientTests.CanClientExecuteUpdate() line 73
  RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
  MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr)
  ----- Inner Stack Trace -----
  HiveServer2Statement.ExecuteStatementAsync() line 119
  HiveServer2Statement.ExecuteQueryAsync() line 43
  HiveServer2Statement.ExecuteUpdateAsync() line 61

How can we reproduce the bug?

Using VS, invoke the tests for Apache.Arrow.Adbc.Tests.Drivers.Apache.Spark level.

Environment/Setup

No response