Bertverbeek4PS / bc2adls

Exporting data from Dynamics 365 Business Central to Azure data lake storage or MS Fabric lakehouse
MIT License
60 stars 22 forks source link

Pyspark script for amending table after full load #186

Open ZYasinn opened 3 weeks ago

ZYasinn commented 3 weeks ago

Hi Bert,

Hope youre well. And thanks for creating this solution as it has helped me and my team immensely.

We used the Fabric Lakehouse version from your solution to pull data/tables from BC. I used the 'CopyBusinessCentral.ipynb' script to do a full load of our tables which was a success.

Issue: We are now trying to make changes to your script mentioned above so as to update our lakehouse tables with delta .csv files that are exported from BC. I tried to replace the code 'Overwrite' with 'Append' (ss attached: https://github.com/user-attachments/assets/a580b1d4-c92d-46ad-9de2-121517a49eb3)

BUT it didnt work as it adds deleted rows (NULL values) and additional updated rows (DUPLICATES) to our Fabric table. After a good few days spent with no success, would u be able to help us out with the following or point us in the right direction?

  1. If any existing row in BC table is updated, our Lakehouse table needs to reflect this change. (data pulled through delta .csv file )
  2. Deleted rows have to be removed from Lakehouse table.

N.B. we are pulling data from multiple companies into one table in Fabric Lakehouse which has worked perfectly for a full load so far.

Thank you in advance.

Kind regards, Zeb

Bertverbeek4PS commented 3 weeks ago

@ZYasinn thanks for your words!

In this case you cannot use the append function and you can only use the overwrite function. This is because.

When there is data in the table we put that data into the dataframe. Then we take everything from the delta .csv files. After all this is in the dataframe we first remove the deleted items. Then we are going to sort on SystemModifiedAt and SystemId and remove all the duplicates. Then we have a represent of the BC table in an dataframe and overwrite the exciting table with that dataframe.

That is why we do an overwrite and not an append. Because we need also the old records. So not sure why you want to use an append in this case.

You can use it. But then you need after your append the same logic as described above.