fabragaMS / ADPE2E

Azure Data Platform End-to-End
343 stars 239 forks source link

Role of Data Bricks within Solution - Lab 4: Add AI to your Big Data Pipeline with Cognitive Services #14

Closed Josh-BI-UK closed 3 years ago

Josh-BI-UK commented 3 years ago

Hi @fabragaMS,

What an amazing set of labs you have created. Absolutely super valuable. In my efforts to learn how to become a better Azure Analytics Solutions Architecture I had a few questions about your chosen architecture.

In Lab 4: Add AI to your Big Data Pipeline with Cognitive Services: We use Data Bricks as the platform to call an Azure Computer Vision API.

Link to lab: https://github.com/fabragaMS/ADPE2E/blob/master/Lab/Lab4/Lab4.md

  1. Why have you chosen to use Data Bricks to provide that aspect of the solution?

  2. What role would you say Data Bricks is serving within your solution design?

  3. Is there other options which could be used (within the Azure eco-system) which perform the same role as Data Bricks for this particular use case?

  4. Within the “Create Data Bricks Linked Service in Azure Data Factory” section of the lab we use Data Factory to run our Data Bricks applet/workspace, which calls the Compute Vision service.

Does that mean the Data Bricks cluster which provide the compute for this aspect must be active indefinitely (or at least as long as we need to use the solution we are implementing)? [link to section within lab: https://github.com/fabragaMS/ADPE2E/blob/master/Lab/Lab4/Lab4.md#create-databricks-linked-service-in-azure-data-factory ]

fabragaMS commented 3 years ago

Hey Josh,

I’m glad you enjoyed the labs and I hope you learned a thing or two from them.

As to your question about the use of Databricks to invoke Cognitive Services, yes it is an overkill. A Databricks cluster is not required to call Cognitive Services and a call to an Azure Function would do the job, but for the purposes of the workshop I did not want to introduce yet another service to the architecture and another set of skills (C#, for example).

The role Databricks play in the overall architecture is of a big data analytics engine. For analogy, what SQL Server is to an .mdf file (database file) Databricks is for the data lake. Data Lake is simply storage, you need a compute engine able to process these files and extract the only the data you need from it, and that’s Databricks job. Hope this makes sense.

Cheers,

Fabio Braga Senior Cloud Solution Architect – Data & AI Office: +61 (7) 3218 7010 Mobile: +61 402 290 501 fabio.braga@microsoft.commailto:fabio.braga@microsoft.com

Level 28, 400 George Street Brisbane QLD 4000 Australia [cid:image001.png@01D6B805.83C436F0]

From: Josh-BI-UK notifications@github.com Sent: Wednesday, 11 November 2020 3:19 AM To: fabragaMS/ADPE2E ADPE2E@noreply.github.com Cc: Fabio Braga Fabio.Braga@microsoft.com; Mention mention@noreply.github.com Subject: [fabragaMS/ADPE2E] Role of Data Bricks within Solution - Lab 4: Add AI to your Big Data Pipeline with Cognitive Services (#14)

Hi @fabragaMShttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FfabragaMS&data=04%7C01%7Cfabio.braga%40microsoft.com%7C0bc497a187554ca9caee08d8859cb63a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637406255410859280%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=hBsghEKM9Kx9yEnB1ndUGt3DYe9%2FMH%2FnYd4TySymWlU%3D&reserved=0,

What an amazing set of labs you have created. Absolutely super valuable. In my efforts to learn how to be an Azure Analytics Solutions Architecture I had a question about your chosen architecture. In Lab 4: Add AI to your Big Data Pipeline with Cognitive Services: We use Data Bricks as the platform to call an Azure Computer Vision API.

Link to lab: https://github.com/fabragaMS/ADPE2E/blob/master/Lab/Lab4/Lab4.mdhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FfabragaMS%2FADPE2E%2Fblob%2Fmaster%2FLab%2FLab4%2FLab4.md&data=04%7C01%7Cfabio.braga%40microsoft.com%7C0bc497a187554ca9caee08d8859cb63a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637406255410869239%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=YPop%2FSvRb5rKp4JjCnvO7%2BLv9JW1gkXg3SWwpzuahbg%3D&reserved=0

  1. Why have you chosen to use Data Bricks to provide that aspect of the solution?
  2. What role would you say Data Bricks is serving within your solution design?
  3. Is there other option which could be used (within the Azure eco-system) which perform the same role as Data Bricks for this particular use case?
  4. Within the “Create Databricks Linked Service in Azure Data Factory” section of the lab we use Data Factory to run our Data Bricks applet/workspace which calls the Compute Vision service. Does that mean the Data Bricks cluster which provide the compute for this aspect must be active indefinitely (or at least as long as we need to use the solution we are implementing)? [link to section within lab: https://github.com/fabragaMS/ADPE2E/blob/master/Lab/Lab4/Lab4.md#create-databricks-linked-service-in-azure-data-factory ]

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FfabragaMS%2FADPE2E%2Fissues%2F14&data=04%7C01%7Cfabio.braga%40microsoft.com%7C0bc497a187554ca9caee08d8859cb63a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637406255410869239%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=FCZJMUox2hdhydjIywZrGzRzxkteyizaQK6zypKOheY%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAL66ED4WLO37FJCHGE2PVKLSPFYYFANCNFSM4TQ575OQ&data=04%7C01%7Cfabio.braga%40microsoft.com%7C0bc497a187554ca9caee08d8859cb63a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637406255410879190%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=dSKSKDXCCuRRBLzp47ZNdOaCv%2FKNYaH2y%2BBQjacRQ1U%3D&reserved=0.

Josh-BI-UK commented 3 years ago

Fab thank you. Very clear.