Bertverbeek4PS / bc2adls

Exporting data from Dynamics 365 Business Central to Azure data lake storage or MS Fabric lakehouse
MIT License
60 stars 22 forks source link

New Feature: Option to automatically clear tracked deleted items #167

Closed bartkraai closed 2 months ago

bartkraai commented 2 months ago

At the moment there is an option in the configuration page to clear the "ADLSE Deleted Record" table, this has to be done manually or by using the Job Queue. As this is not always setup at our customers, this causes huge tables (>200,000,000 record in just a few months).

Solution: In my opinion, it would be better to automatically clear the deleted records table after the export has successfully finished. Or at least give the option to do this on the configuration page with a boolean.

Thanks in forward for all your time on this project!

Kind Regards, Bart 4PS NL BV

Bertverbeek4PS commented 2 months ago

Hi @bartkraai the deleted records is also something that you want to anaylse when there is something wrong. And another parameter I'm really not a fan about it. Also you cannot add this to the retention policy because you need to have a filter on the entry no that was deleted.

So I would suggest that you setup the job queue. Like you also need to setup the job queue for exporting the data.

LumosPatronum commented 2 months ago

@Bertverbeek4PS do you have any thoughts on how deleted records can be handled on downstream tables in fabric? so based on the adlse export to fabric, anytime there are deleted records in BC, the systemcreatedat field will be blank during the export and the fabric notebook uses the left anti join to remove those records when refreshed from the tables in fabric, but lets say i have downstream tables from the landing lakehouse in fabric from the adlse export, how can we workaround allowing these other tables that depend on this landing table to recognize records have been deleted? since they fall off in the export lakehouse.

Bertverbeek4PS commented 2 months ago

Hi @LumosPatronum, Indeed in the raw dataset we are deleting those records. But in the process later so get structured data you mostly can choose between append or renew. You have to renew is always because of the deleted records. But for example GL entries you can do easily append (so it is depending on the table). But what is your process when you move it from a Lakehouse delta table to something else? Do you do it with python code, Power Query or something else?

LumosPatronum commented 2 months ago

Hi @Bertverbeek4PS , I use pyspark code but to your point i see what you are saying. I think the deletion was really a requirement for me for the sales header table because i was using this to show all the open orders but at times we end up deleting the record on BC which is why i wanted the ability to recognize the deleted records for downstream tables, but you are right on tables like gl entries, append works perfectly. for sales header i believe switching to read directly from the refreshed data each time is better, that way it is always showing its latest state. thanks for your response!