NYCPlanning / data-engineering

Primary repository for NYC DCP's Data Engineering team
14 stars 0 forks source link

format sql files in CPDB #249

Closed damonmcc closed 8 months ago

fvankrieken commented 8 months ago

I took this over from Damon.

Mostly just fluffing. One note though - there's the slightest bit of logic change in projectcategorization.sql, loading patterns into a "seed" table from the data folder rather than having a huge list of ORs in the where clause

fvankrieken commented 8 months ago

@damonmcc you can't "review" since this was your PR originally, but it should be good to go

fvankrieken commented 8 months ago

There are a couple relevant things, so if you look at commits you can see the ones that aren't just running sqlfluff fix products/cpdb/sql, which is the final commit

fvankrieken commented 8 months ago

Sorry y'all - decided to make one more tweak (see commit 4), need to test to make sure new logic is sound

fvankrieken commented 8 months ago

Cool - I think I get the gist. Wasn't sure what, if anything, besides formatting happened in commit 3 but I assume it was nothing too dramatic.

There was a group by happening after a lot of aggregation/joining that bugged me (as it essentially grouped by every single field), so I wrote one CTE to get rid of it and do that aggregation one step earlier

fvankrieken commented 8 months ago

Commit 4 finalized now and working properly

damonmcc commented 8 months ago

approved!