GT-Analytics / fuam-basic

FUAM is a solution to enable a holistic monitoring on top of Power BI and Fabric.
MIT License
14 stars 1 forks source link

Pipeline "Load_Capacity_Refreshables_E2E" is failing #3

Open FrankPreusker opened 3 days ago

FrankPreusker commented 3 days ago

The execution of the pipeline "Load_Capacity_Refreshables_E2E" is failing in the last step running the notebook "01_Transfer_Capacity_Refreshables_Unit":

image

The error is happening in cell 10 #Main merge: image Message: Py4JJavaError: An error occurred while calling o4808.execute. : org.apache.spark.sql.delta.DeltaUnsupportedOperationException: Cannot perform Merge as multiple source rows matched and attempted to modify the same target row in the Delta table in possibly conflicting ways. By SQL semantics of Merge, when multiple source rows match on the same target row, the result may be ambiguous as it is unclear which source row should be used to update or delete the matching target row. You can preprocess the source table to eliminate the possibility of multiple matches.

I have also manually deleted 4 capacity_refreshable* tables in the Lakehouse and tried a manual re-run of the notebook. This time it stopped in cell 19 #Merge Summary at the last step of the cell (during .saveAsTable(gold_summary_table_name)): image

kethom-analytics commented 2 days ago

Hello,

Can you try to display the silver_main_df to check, if there are duplicates regarding the Merge Keys?

Thanks in advance.

Best regards Kevin