Bertverbeek4PS / bc2adls

Exporting data from Dynamics 365 Business Central to Azure data lake storage or MS Fabric lakehouse
MIT License
49 stars 20 forks source link

Characters Removed from Extracted Data - How does extract handle invalid JSON characters #169

Open tglink72 opened 3 weeks ago

tglink72 commented 3 weeks ago

Hello,

                      We are having an issue with Text fields in Business Central that contain backslashes '\' in them. The \ are removed when viewing the data via Synapse and in the delta and data files themselves. When are the invalid JSON characters escaped out of the data in the BC2ADL process and can this be modified to allow these characters?

Thanks

Tom

image image (1)

Bertverbeek4PS commented 3 weeks ago

HJi @tglink72 so currently your scenario is when you export a text field with a '\' in it will be already removed in the .csv file?

tglink72 commented 3 weeks ago

Bert,

           Thanks  for the reply. The scenario currently is that when we export a text field with a backslash \ in it. The \ is removed. An example is in the screenshot below.  The regex column has this value in a basic Synapse query. It looks like they are being removed in the initial export to the Delta folder as I do not see the \ in the Regex field in the raw csv in the delta folder.

Regex-3

^Add[A-Z]-d{5}$

But if we view the data in BC below you will see it contains \ that are not included in the query.

@.***

From: Bert Verbeek @.> Sent: Friday, September 6, 2024 2:39 AM To: Bertverbeek4PS/bc2adls @.> Cc: Tom Link @.>; Mention @.> Subject: Re: [Bertverbeek4PS/bc2adls] Characters Removed from Extracted Data - How does extract handle invalid JSON characters (Issue #169)

You don't often get email from @.*** Learn why this is importanthttps://aka.ms/LearnAboutSenderIdentification

HJi @tglink72https://github.com/tglink72 so currently your scenario is when you export a text field with a '' in it will be already removed in the .csv file?

— Reply to this email directly, view it on GitHubhttps://github.com/Bertverbeek4PS/bc2adls/issues/169#issuecomment-2333439537, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AZOTVUKZGNEHXJB44Q4WEE3ZVFLZZAVCNFSM6AAAAABNSS4PLGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZTGQZTSNJTG4. You are receiving this because you were mentioned.Message ID: @.***>

"The information contained in this e-mail, and any attachment, is confidential and is intended solely for the use of the intended recipient. Access, copying or re-use of the e-mail or any attachment, or any information contained therein, by any other person is not authorized. If you are not the intended recipient, please immediately return the e-mail to the sender and delete it and any attachment from your computer. Although we attempt to sweep e-mail and attachments for viruses, we do not guarantee that either are virus-free and accept no liability for any damage sustained as a result of viruses."

Bertverbeek4PS commented 2 weeks ago

Thanks @tglink72 I then look if I can repo it. Hopefully I have got time for it this week.

Bertverbeek4PS commented 1 week ago

@tglink72 I did a test with the export to MS Fabric Lakehouse. Customer comments: image Then export to csv delta file: image When the notebook is runned: image

So with Fabric it goes well. I will also look into export of Azure Data Lake

Bertverbeek4PS commented 1 week ago

@tglink72 I have also tested it with the synapse pipelines and Azure File Storage. But cannot reproduce it.

WHen exporting the delta's: image

WHen the synapse pipeline runs and in PowerBI: image

I'm using parquet files as destination.

tglink72 commented 1 week ago

Bert,

           Thanks for the replies. I am also using synapse pipeline and Azure File storage and unfortunately, I can create it each time. The backslash is removed in the extension extract to the deltas folder. If you would like we could do a screenshare.

Thanks

Tom Link

From: Bert Verbeek @.> Sent: Monday, September 16, 2024 1:16 PM To: Bertverbeek4PS/bc2adls @.> Cc: Tom Link @.>; Mention @.> Subject: Re: [Bertverbeek4PS/bc2adls] Characters Removed from Extracted Data - How does extract handle invalid JSON characters (Issue #169)

You don't often get email from @.*** Learn why this is importanthttps://aka.ms/LearnAboutSenderIdentification

@tglink72https://github.com/tglink72 I have also tested it with the synapse pipelines and Azure File Storage. But cannot reproduce it.

WHen exporting the delta's: image.png (view on web)https://github.com/user-attachments/assets/951b7fcb-3210-4004-92ae-e6ed3a6dd0f6

WHen the synapse pipeline runs and in PowerBI: image.png (view on web)https://github.com/user-attachments/assets/51a25036-b913-43b9-93d7-73cc58d10ed2

I'm using parquet files as destination.

— Reply to this email directly, view it on GitHubhttps://github.com/Bertverbeek4PS/bc2adls/issues/169#issuecomment-2353592433, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AZOTVUKETWUYTAWKG2LXG5TZW4N45AVCNFSM6AAAAABNSS4PLGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJTGU4TENBTGM. You are receiving this because you were mentioned.Message ID: @.***>

"The information contained in this e-mail, and any attachment, is confidential and is intended solely for the use of the intended recipient. Access, copying or re-use of the e-mail or any attachment, or any information contained therein, by any other person is not authorized. If you are not the intended recipient, please immediately return the e-mail to the sender and delete it and any attachment from your computer. Although we attempt to sweep e-mail and attachments for viruses, we do not guarantee that either are virus-free and accept no liability for any damage sustained as a result of viruses."

Bertverbeek4PS commented 1 week ago

Ok strange @tglink72 . Which version do you have? Indeed is it OK to have a meeting? On friday I got the whole afternoon available.