Closed dave-connors-3 closed 2 years ago
cc @johnblust it'll be great to link to this for the Surrogate Key glossary page
ah @gwenwindflower i think this was the issue I created, which was incidentally closed with the surrogate key glossary! i will reopen this
My mistake! I selected the wrong issue
@dave-connors-3 Also realizing I totally dropped the ball on this. I won't have time today and I'm OOO tomorrow, but I'll make sure I add this into asana for me to review next week 🙏 Sorry for the delay
@johnblust i don't think you dropped any balls! i don't even know if any balls were picked up
the only ball here is the on we're gonna have publishing this article. 🏀 thanks for getting this back in the mix both of you!
@dave-connors-3 Left some small comments in the Figjam, but overall it's a great foundation! Let's take this outline a level deeper. You can do that inside the figjam or in some kind of text document. Whichever you prefer! Looking forward to seeing how this plays out 💃
@dave-connors-3 bumping this so it stays on your radar. Let me know if the feedback I gave makes sense!
@johnblust Appreciate the nudge! @mikegfuller and I are moving at a glacial pace 😄
We have a draft going already, but given some of the conversations we've been in with some customers where this has come up, it's likely to change quite a bit! Draft is here
@johnblust @jasnonaz most recent draft lives here
would love some feedback!
Left some feedback in the Notion doc yesterday. Overall great job! Think the main thing here is we need a good title and this is ready in my eyes
thanks @johnblust! I will look over those notion comments -- once we're good there and decide on a title, do I open a PR as the next step?
Yep!
for Developer Blog
Contact Details
@dave in slack
This issue is being created to promote an approved Discussion.
Link to the Discussion this Issue is promoting.
1303
If you're promoting this from an 'I want to write about...' Discussion, you can copy over the answers to the following questions. Otherwise, please answer these questions below
In a conversation with @mikegfuller and @matt-winkler, we were chatting about the pain of migrating from a SQL Server-style deployment to dbt with a specific eye towards maintaining monotonically increasing integer surrogate keys. While they were useful in the past (better join performance, coordination between loading Kimball-style dims and facts etc), they make your project stateful -- in order to add new data to model A, you need to know about the current state of the data in model A. This goes against the core principal of idempotency, and can lead you into very scary design patterns if you try to deploy this type of setup in dbt.
This article would advocate for derived surrogate keys, the power of the surrogate key dbt util, and to help you start to treat your models like cattle instead of pets (@randypitcherii 's words, not mine)
What is the main problem you are solving? What is your solution? teaching folks coming from legacy tools how to rethink their assumptions about surrogate keys. The solution would be to derive the keys from the raw data in your warehouse in an idempotent way (dbt_utils.surrogate_key)
Why should the reader care about this problem? Why is your solution the right one? This should help form your specific target audience. Trying to build surrogate keys the Old Way in dbt is a real pain, and does not actually help anyone. The target audience here are folks trying to do a 1:1 migration of Old Data to New Data -- now is the time to rethink your assumptions!
Link to the initial outline we will develop through this Issue.
link to the current blustfall