Azure-Samples / streaming-at-scale

How to implement a streaming at scale solution in Azure
MIT License
233 stars 98 forks source link

Duplicate data is being generated #57

Closed yorek closed 5 years ago

yorek commented 5 years ago

Duplicate data is being generated and as result, Azure SQL is (correctly :)) preventing it to be inserted:

System.Data.SqlClient.SqlException (0x80131904): Violation of PRIMARY KEY constraint 'PK__#BAEC272__7944C811B84F8B93'. Cannot insert duplicate key in object 'dbo.@payload'. The duplicate key value is (5afa9381-61c1-4d15-ab19-8b616acf65da).
The data for table-valued parameter "@payload" doesn't conform to the table type of the parameter. SQL Server error is: 3602, state: 30
The statement has been terminated.
   at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)
   at System.Data.SqlClient.SqlCommand.CompleteAsyncExecuteReader()
   at System.Data.SqlClient.SqlCommand.EndExecuteNonQueryInternal(IAsyncResult asyncResult)
   at System.Data.SqlClient.SqlCommand.EndExecuteNonQuery(IAsyncResult asyncResult)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
ClientConnectionId:43576d0a-f89c-43b9-b61f-4e82783d4c38
Error Number:2627,State:2,Class:14
ClientConnectionId before routing:8ed108b2-dcc0-4a47-bba7-ea13633358d6
Routing Destination:d7f1e4838529.tr2237.eastus1-a.worker.database.windows.net,11044
algattik commented 5 years ago

By default the simulator duplicates every 1000th event on average, to assert possible deduplication strategies in the pipeline. Created #58 to improve documentation on this.

You can use SIMULATOR_DUPLICATE_EVERY_N_EVENTS to disable this in the pipeline, see eventhubs-streamanalytics-azuresql

yorek commented 5 years ago

Got it, thanks!