dbt-labs / dbt-technical-blog-writing

Conversation around dbt technical tutorials, blogs, guides, etc
28 stars 1 forks source link

dbt + machine learning: what makes a great baton pass? #72

Closed sungchun12 closed 2 years ago

sungchun12 commented 2 years ago

What's your key point? Problem: dbt has done a great job of building an elegant, common interface between data engineers and data analysts: uniting on SQL. As the data industry evolves, there's plenty of pain and room to grow in building that interface between data scientists and data analysts. There isn't a good answer for when things go wrong in the machine learning arena: should the data analyst own fine-tuning the pre-processing data(think: prepping transformed data even more for machine learning models to better work with the data). Should we increase the SQL surface area to build ML models or should we leave that to non-SQL interfaces(python/scala/etc.)? Does this have to be an either/or future?

Key Point: Whatever the interface evolves into, it must center people, create a low bar and high ceiling, and focus on outcomes and not the mystique of features/tools behind a learning curve.

Prior art: Any other posts that exist on this topic (here or elsewhere).

Link to notes / outline / draft: Google docs preferred, please set sharing to anyone with the link can view.

Estimated first draft date: Leave blank if you already shared a draft above. 01/14/2022, Friday: in a google doc

Any open questions / requests for help from the group?

sungchun12 commented 2 years ago

don't forget about hex!

krevitt commented 2 years ago

Love this one Sung, I can work with you on it when you're ready - as usual I'm curious about stories from your own work that you can build the post around (as so much of the convo on this stuff has been hypothetical so far)

sungchun12 commented 2 years ago

@krevitt Feel free to check progress in real-time: here

krevitt commented 2 years ago

I wonder if there's two separate posts here:

  1. a step-by-step walkthrough of the optimal baton pass as you've seen it play out (or how you could see it working better based on observing poor baton passes)
  2. walkthroughs of how an individual tool works within that baton pass (almost like a tool unboxing)

feel like tackling both of those in one post is a lot, what do you think?

sungchun12 commented 2 years ago

@krevitt

Thanks for the suggestion! I'm planning to focus in on observing poor baton passes, pressing into the core behaviors and outcomes that underly the poor baton passes, address conceptually how the tools I've seen so far address the former, and name in an ideal scenario what I'd like to see in a next generation workflow.

I don't plan to do a full tool unboxing. I recommend we leave that as an open question to the readers. Which tool is worth doing a full unboxing for another blog post?

All the above should fit just right in a single blog post. I expect this blog will be 1x-1.5x the dbt and airflow blog post we released together.

sungchun12 commented 2 years ago

@izzye84 will officially be a co-writer of this blog! He'll bring the machine learning expertise to the table!

krevitt commented 2 years ago

@sungchun12 what do you think as a publication date for this one? and holler whenever you're ready for an editing pass

sungchun12 commented 2 years ago

@krevitt For publication, let's make it happen on Monday, 2/7/2022.

Emilie made a bunch of comments I'll need to ruminate on. I'll send a meeting invite for an editing pass!

johnblust commented 2 years ago

Link to figma visual outline: https://www.figma.com/file/n47XZkyPt3mfHVyc2nrfXV/dbt-%2B-ML-mind-map?node-id=2%3A614

johnblust commented 2 years ago

@sungchun12 Hey Sung! I gave your content a quick editing pass and made some suggestions for small structural changes. In many ways, the content is great! I would love to see the main ideas come forward & stand out a little more so readers can easily track their progress through the content. Here's a quick summary of my suggested edits to that end:

I'll just need you to go in, review the changes, and add/adjust some suggestions to fit better with your voice. I left a bunch of comments with my rationale for some of my suggestions, so if you have questions or better approaches, please let me know!

sungchun12 commented 2 years ago

@johnblust I resolved all the comments and it's ready for another review!

johnblust commented 2 years ago

@johnblust I resolved all the comments and it's ready for another review!

Awesome, I'll review & let you know next steps early next week!! Great job on the fast turnaround @sungchun12

sungchun12 commented 2 years ago

Izzy and I feel good about this! Keep us updated John!

johnblust commented 2 years ago

@sungchun12 Okay, then we're ready to publish!! Planning to publish this week. I'll let you know when it is live :)

gwenwindflower commented 2 years ago

closing this issue for now in the repo migration, but since we're close to publish i want to make sure we link through to this discussion in the Docs repo https://github.com/dbt-labs/docs.getdbt.com/discussions/1158 for continuity and future ML-related follow ups to branch off this.