Azure / Azure-DataFactory

Other
482 stars 586 forks source link

Copy data activity loses factorial seconds #72

Open alexwilsh opened 5 years ago

alexwilsh commented 5 years ago

We have simple pipeline for incremental copying of data from one db to another one. We use the approach you have described in documentation: https://docs.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-multiple-tables-portal

Basically we have watermark table, we pull old watermarks and new ones to find the delta. Then we call Copy Data which makes the call of stored procedure in Azure SQL Server with watermarks as parameters. Then we update watermark table by using StoredProcedure activity.

Our watermarks have DateTimeOffset type. Copy Data activity loses factorial seconds meaning: Value like 2018-12-17 14:02:47.1696724 +00:00 becomes like this 2018-12-17 14:02:47.0000000 +00:00

This causes the issue when we miss some data if it was added same second but a bit later. At the same time StoredProcedure activity works with DateTimeOffset properly without any loss.

It doesn't sound like a huge issue, but it was tricky to find this misbehavior. Actually, when our users reported this issue it sounded like a fiction :)

We have two workarounds to avoid this issue:

  1. increment the value of watermark - basically round up the value (ceiling)
  2. instead of taking last watermark value from target database table, we use current datetimeoffset value as a watermark

Bu I believe this either should be fixed in CopyData activity or at least in mentioned in documentation.

fhljys commented 3 years ago

Are you still experiencing this?

fhljys commented 3 years ago

Close due to inactivity

lpmac commented 3 years ago

I'm experiencing this very issue with my high water mark field but it's also impacting any other datetimeoffset fields in my copy activity and since I'm dynamically mapping my fields because I'm copying a large of number of tables from Odata to Azure SQL db, I can't explicitly format my output of every single field this is affecting. This is really poor in terms of accuracy of copy activity.

fhljys commented 3 years ago

@lpmac , could you provide your copy activity run ID?

lpmac commented 3 years ago

@fhljys f33c5464-84b2-45b6-a51a-ceb93a11b1b3 This is the pipeline run ID which does the copy activity

fhljys commented 3 years ago

@lpmac , could you help identify which stage caused this issue? For example, you can check lookup output and set variable input/output? Or if possible, you could create a support ticket to Microsoft.

lpmac commented 3 years ago

@fhljys you can't use a lookup output of the copy activity as it does not give you the detail required to preview the level of precision the source data is pulling through for the datetimeoffset fields.

To clarify, I have a copy data activity with an Odata source and Azure SQL db sink and regardless of whether the data is explicitly mapped or dynamically mapped - the fields with datatype datetimeoffset are not being copied into the sink SQL db with the same level of precision as they are found when directly querying Odata from Postman. Odata via Postman field: 2021-07-07 13:55:58.9100000 +01:00

Output of Copy activity field: 2021-07-07 13:55:58.0000000 +01:00

I have raised it here; https://feedback.azure.com/forums/270578-data-factory/suggestions/43971279-copy-activity-to-preserve-precision-of-datetimeoff