PicnicSupermarket / diepvries

The Picnic Data Vault framework.
https://diepvries.picnic.tech
MIT License
126 stars 15 forks source link

Add DISTINCT to filtered_staging in satellite MERGE statement #18

Closed dlouseiro closed 3 years ago

dlouseiro commented 3 years ago

Context

While testing the execution of our Data Vault jobs, I noticed an improvement that our satellite MERGE queries are considering all records coming from the staging table without applying any DISTINCT. This works in most cases, but can generate unwanted errors.

Example: The extraction table is populated by an endpoint that returns us orderlines (order/article level). Although, one of the fields (let's say order_status) is actual a property of the order, hence will be loaded in hs_order. If we receive 3 orderlines for the same order, the status of all of them is OPEN. If we do not have a DISTINCT, the process will fail with duplicates, but if we have the DISTINCT the process will work.

Implementation

Added DISTINCT to both effectivity_satellite and satellite DMLs.

dlouseiro commented 3 years ago

I will actually improve the query even further

dlouseiro commented 3 years ago

Can you also add a line in the changelog?

Sure, I forgot