MicrosoftLearning / dp-203-azure-data-engineer

Exercise files for Microsoft Data Engineer curriculum
https://microsoftlearning.github.io/dp-203-azure-data-engineer/
MIT License
395 stars 450 forks source link

Lab Exercise 02 is not working as described in the tutorial #110

Closed tafarelloit closed 3 months ago

tafarelloit commented 3 months ago

Module: dp-203-azure-data-engineer

Lab/Demo: 02

Task: Query data in files

Step:

Description of issue

Directions in the tutorial

https://microsoftlearning.github.io/dp-203-azure-data-engineer/Instructions/Labs/02-Analyze-data-with-sql.html

Location of your script in the github https://github.com/MicrosoftLearning/dp-203-azure-data-engineer/tree/master/Allfiles/labs/02

This exercise is not working, please review the following: A) In the Azure Data Lake Storage Gen2 the script didn't create the order folder inside of sales folder, which is not a problem but I though I had done something wrong, and because of that I might be charged twice because I deleted the resources to repeat the test.

Select the files container, and note that it contains folders named sales and synapse. The synapse folder is used by Azure Synapse, and the sales folder contains the data files you are going to query. Open the sales folder and the orders folder it contains, and observe that the orders folder contains .csv files for three years of sales data. ![image](https://github.com/user-attachments/assets/453748d9-9071-45d0-bc86-0ce4cec3a861) B) The spark spool was not generated. In the new Notebook 1 tab that opens, in the Attach to list, select your Spark pool (sparkxxxxxxx). Then use the ▷ Run all button to run all of the cells in the notebook (there’s currently only one!). ![image](https://github.com/user-attachments/assets/9208cf10-6f6c-44b0-a77a-818ba3467dbb) Repro steps: 1. clone the repository git clone https://github.com/MicrosoftLearning/DP-203-Azure-Data-Engineer dp203 2. open the folder and run the script setup.ps1 3. cd dp203/Allfiles/labs/02 ./setup.ps1
TheJamesHerring commented 3 months ago

@tafarelloit added a note for manual upload of the files. We see this happen in some cases every 6 months or so. This was ported back from the DP-500 content and we'll continue to monitor it for performance and issues. Thanks for the assist.