Closed vvcb closed 5 months ago
@katy-sadowski , picking this task up from the Teams conversation. Happy to do this but should we wait until we have decided on the project directory structure and the build up vs break down strategy that we are finalising. Else we may end up with multiple unmerged PRs that will result in conflicts. Sqlfluff is already part of my mega PR :wink: but happy to make a separate PR once we have settled on approach. Shouldn't take very long to do.
Thanks @vvcb ! Regarding directory structure - what are your thoughts on what I've put here (plus an Intermediate layer)? Any additional decisions to make on directories besides this?
Regarding build up vs break down - can you please elaborate on what you mean by that? Do you mean stuff like precommit hooks etc?
We can discuss it live next time we meet but I'm also happy to continue the convo here in the meantime :)
@katy-sadowski , agree with moving the entire project to the root directory like you have done in your PR (and how it is in mine as well).
Regarding sub folders within the models folder, there are many ways to 'skin a cat' (never understood why there is one way let alone many but not sure where this phrase has come from).
In my team, we started with source
, staging
, vocab
and omop
folders and then added a few more as the need arose. Within those folders, the models have a consistent naming convention.
https://omop-lsc.surge.sh/rxn_dev/index.html#!/overview/dbt_omop
All this can get very complicated very quickly if we are not careful. So, starting simple (and starting itself) is a good thing and we can revisit frequently early and less frequently later.
In life and tech, I find PEP 20 is a good anchor.
By break down, I meant my approach of starting with the fully built lineage. Build up is the first principles approach you described. The fully built lineage presents many surfaces for people with varying skills to play on - breaking down into component models, making the code dialect agnostic one model at a time, documentation, tagging, adding macros, etc. Building up may be a lot more challenging to work together as a team - but this may simply be a function of my ignorance of how or where we would start.
Thanks for the additional context! For directory structure, I think let's stick with the dbt standard for now since we don't have a strong reason (yet) to deviate. Looking to the future, I think this will also be the best starting point for a more generic ETL project.
Regarding break down / build up - I am also starting to think it'll be easier to having the "skeleton" of the full ETL to work with vs starting with a blank slate. Either way though I do think we should probably have targeted PRs for the bits of the "mega" PR we want to incorporate. I'm not very familiar with SQLFluff, but my thinking in adding it in now was that we could ensure all SQL committed from the very beginning is linted and consistent. Do you agree with that?
And then I'm thinking:
Thoughts?
The purpose of this task is to enforce consistent SQL syntax throughout the repo, amongst other benefits. See https://sqlfluff.com/