dbt-labs / docs.getdbt.com

The code behind docs.getdbt.com
https://docs.getdbt.com/
Apache License 2.0
117 stars 918 forks source link

Surrogate Keys #1396

Closed dave-connors-3 closed 2 years ago

dave-connors-3 commented 2 years ago

for Developer Blog

Contact Details

@dave in slack

This issue is being created to promote an approved Discussion.

Link to the Discussion this Issue is promoting.

1303

If you're promoting this from an 'I want to write about...' Discussion, you can copy over the answers to the following questions. Otherwise, please answer these questions below

In a conversation with @mikegfuller and @matt-winkler, we were chatting about the pain of migrating from a SQL Server-style deployment to dbt with a specific eye towards maintaining monotonically increasing integer surrogate keys. While they were useful in the past (better join performance, coordination between loading Kimball-style dims and facts etc), they make your project stateful -- in order to add new data to model A, you need to know about the current state of the data in model A. This goes against the core principal of idempotency, and can lead you into very scary design patterns if you try to deploy this type of setup in dbt.

This article would advocate for derived surrogate keys, the power of the surrogate key dbt util, and to help you start to treat your models like cattle instead of pets (@randypitcherii 's words, not mine)

What is the main problem you are solving? What is your solution? teaching folks coming from legacy tools how to rethink their assumptions about surrogate keys. The solution would be to derive the keys from the raw data in your warehouse in an idempotent way (dbt_utils.surrogate_key)

Why should the reader care about this problem? Why is your solution the right one? This should help form your specific target audience. Trying to build surrogate keys the Old Way in dbt is a real pain, and does not actually help anyone. The target audience here are folks trying to do a 1:1 migration of Old Data to New Data -- now is the time to rethink your assumptions!

Link to the initial outline we will develop through this Issue.

  1. I don't think so -- defintions of key concepts will be covered in this article
  2. not that i'm aware of -- we'll have some macros and the like, but nothing beyond that that a reader would need
  3. yeah! most of the article will be about tradeoffs
  4. the original surrogate key SQL magic post
  5. natural keys all the way down

link to the current blustfall

KiraFuruichi commented 2 years ago

cc @johnblust it'll be great to link to this for the Surrogate Key glossary page

dave-connors-3 commented 2 years ago

ah @gwenwindflower i think this was the issue I created, which was incidentally closed with the surrogate key glossary! i will reopen this

johnblust commented 2 years ago

My mistake! I selected the wrong issue

johnblust commented 2 years ago

@dave-connors-3 Also realizing I totally dropped the ball on this. I won't have time today and I'm OOO tomorrow, but I'll make sure I add this into asana for me to review next week 🙏 Sorry for the delay

dave-connors-3 commented 2 years ago

@johnblust i don't think you dropped any balls! i don't even know if any balls were picked up

gwenwindflower commented 2 years ago

the only ball here is the on we're gonna have publishing this article. 🏀 thanks for getting this back in the mix both of you!

johnblust commented 2 years ago

@dave-connors-3 Left some small comments in the Figjam, but overall it's a great foundation! Let's take this outline a level deeper. You can do that inside the figjam or in some kind of text document. Whichever you prefer! Looking forward to seeing how this plays out 💃

johnblust commented 2 years ago

@dave-connors-3 bumping this so it stays on your radar. Let me know if the feedback I gave makes sense!

dave-connors-3 commented 2 years ago

@johnblust Appreciate the nudge! @mikegfuller and I are moving at a glacial pace 😄

We have a draft going already, but given some of the conversations we've been in with some customers where this has come up, it's likely to change quite a bit! Draft is here

dave-connors-3 commented 2 years ago

@johnblust @jasnonaz most recent draft lives here

would love some feedback!

johnblust commented 2 years ago

Left some feedback in the Notion doc yesterday. Overall great job! Think the main thing here is we need a good title and this is ready in my eyes

dave-connors-3 commented 2 years ago

thanks @johnblust! I will look over those notion comments -- once we're good there and decide on a title, do I open a PR as the next step?

johnblust commented 2 years ago

Yep!