While testing the execution of our Data Vault jobs, I noticed an improvement that our satellite MERGE queries are considering all records coming from the staging table without applying any DISTINCT. This works in most cases, but can generate unwanted errors.
Example:
The extraction table is populated by an endpoint that returns us orderlines (order/article level). Although, one of the fields (let's say order_status) is actual a property of the order, hence will be loaded in hs_order. If we receive 3 orderlines for the same order, the status of all of them is OPEN. If we do not have a DISTINCT, the process will fail with duplicates, but if we have the DISTINCT the process will work.
Implementation
Added DISTINCT to both effectivity_satellite and satellite DMLs.
Context
While testing the execution of our Data Vault jobs, I noticed an improvement that our satellite
MERGE
queries are considering all records coming from the staging table without applying anyDISTINCT
. This works in most cases, but can generate unwanted errors.Example: The extraction table is populated by an endpoint that returns us orderlines (order/article level). Although, one of the fields (let's say
order_status
) is actual a property of theorder
, hence will be loaded inhs_order
. If we receive 3 orderlines for the same order, the status of all of them isOPEN
. If we do not have aDISTINCT
, the process will fail with duplicates, but if we have theDISTINCT
the process will work.Implementation
Added
DISTINCT
to botheffectivity_satellite
andsatellite
DMLs.