emilyriederer / website

Blog / website repo
https://emilyriederer.com
3 stars 1 forks source link

Generating SQL with {dbplyr} and sqlfluff | Emily Riederer #15

Open utterances-bot opened 3 years ago

utterances-bot commented 3 years ago

Generating SQL with {dbplyr} and sqlfluff | Emily Riederer

Using the tidyverse's expressive data wrangling vocabulary as a preprocessor for elegant SQL scripts

https://emilyriederer.netlify.app/post/sql-generation/

peterfoley commented 3 years ago

Nice article! There’s a little typo in the second to last code block - the VAR/AVG names are swapped, so they’re matched with the wrong functions.

emilyriederer commented 3 years ago

Thanks so much for letting me know, @peterfoley ! Fixed now. How embarrassing -- "you won't believe this trick to write completely error free code," I said 😳 😆

mdancho84 commented 3 years ago

Fantastic article, Emily. I’d love to do a quick video showcasing these techniques as part of my R-Tips YouTube series. 😀

emilyriederer commented 3 years ago

Thanks, @mdancho84, that's awesome!

Let me know if you need any help on a more "realistic" use case; I always love how practical / modern all of your material is. The main place I personally use this technique is when working with large amounts of panel data to create lags, rolling averages, cumulatives, etc. for a large number of individuals x measures. For the sake of clarity here, I went with a simpler and more widely known dataset

mdancho84 commented 3 years ago

Hey Emily, thanks. Very cool - readable SQL could be a big improvement to dbplyr down the road.

With the R-Tips I think just a quick 5-10min & I'll point to your article for more depth and details on the linting part.

Here's the GitHub repo so you can see what I've been working on. https://github.com/business-science/free_r_tips

amarquard commented 3 years ago

Thanks Emily - this was interesting. I find your posts always cover a topic that I really need at the moment, or something inspirational that I didn’t know I needed. Thanks for taking the time to share.

A minor comment not relevant to the SQL stuff itself. It seems you are getting very low mean values per year, because you are including a lot of zeroes from the if_else statements. I’m on my phone so cannot try to reproduce at the moment (so I apologize if I’m mistaken), but by eye it looks slightly off.

vkatti commented 3 years ago

This is a great resource. I knew how to use dbplyr but didn't know about sqlfluff. Thank you. A more realistic example would be joining 2 tables as dbplyr generates subqueries making me wonder if that is the most efficient way.

colemanrob commented 3 years ago

super helpful, thanks for sharing! :)

arohland commented 1 year ago

Super helpful! I'm looking at adopting a linter to use for our Data Science team right now and stumbled across sqlfluff. We're writing a lot of templated queries ourselves but just want to enforce a common style, like we can in our R code using styler. I believe with installing sqlfluff and writing functions that run the system(paste()) call, we can make linting of sql files super easy for new colleagues.

Currently this blog post is not rendered quite correctly though! I guess the sqlfluff installation failed, I would love to have a look at the results as intended.