Closed allenday closed 1 year ago
I think having a repository wide standard and automatic linting in place would be good!
and the inconsistency of file formatting makes it harder than it would otherwise be.
Could you expand here what are some of the main issues that make it harder?
In practical terms I guess we should take it project by project, it will be quite the endeavor.
I have started a repo wide branch, and yes, knowing what I now know, we should go model by model
Hello, I'm a Data Engineer taking the first steps in the web3 world. Maybe I can help with this one, while I dive more in blockchain analytics. I think a good start point is to fix/enforce some basic rules, that don't break anything, such as leading/trailing commas and upper/lowercase reserved words. It's possible ignore more problematic files and run it as a pre-commit hook too. Can I help with it? Do you have any style guide or SQLFluff config files?
No style guide yet, but I was just discussing this with @dot2dotseurat last week, so great timing!
@lingzhong and I will catch up on this - I think he may have a starter. If not these are some good resources - we can strip down to what we need for the first pass and then add rules gradually.
https://www.sqlstyle.guide/ https://about.gitlab.com/handbook/business-technology/data-team/platform/sql-style-guide/
@cristianwebber I recommend that you select a single folder to work with, for example balancer/ethereum:
https://github.com/duneanalytics/spellbook/tree/main/models/balancer/ethereum
and using a restricted dbt build to get it working, along the lines of:
dbt run --select +balancer+ --exclude tag:gnosis+ --exclude tag:optimism+ --exclude tag:arbitrum+ --exclude tag:solana+ --exclude tag:bnb+ --exclude tag:avalanche_c+
but before this, a very good task is to first clean up all of the bulk SQL inserts and move them to seeds. I opened a separate issue here:
Opened a PR with some basic configuration. It will not break anything and will also not enforce the use. Let me know if that looks good and/or you want to change something in the configurations/PR. I linted the models as a example, but can easily revert the commit if necessary.
Looks pretty good thanks!
Some suggestions:
Add a style guide file that's human readable markdown. The purpose is to explain which sqlfluff rules are in force and why. It's also to explain other styles, such as underscore delimited aliases which aren't checkable by sqlfluff.
Update the build docs to document how the pre-commit hook affects the build, and how to invoke directly/manually as part of the dev process.
Remove any unneeded dependencies. Do we really need the databricks deps?
Sure, I will start working on the style guide, just the basics, but it will probably take a while.
I changed the CI to avoid running the linter, and that makes the PR being closed. Let me know if I should revert the commit.
That is required, because pre-commit hooks run in a different environment. Never found I way to get rid of that.
I reverted the commit touching the CI file. Should I open a new PR or is there any way to reopen that one?
Please open a clean PR
On Wed, May 17, 2023 at 4:48 PM Cristian Webber @.***> wrote:
I reverted the commit touching the CI file. Should I open a new PR or is there any way to reopen that one?
— Reply to this email directly, view it on GitHub https://github.com/duneanalytics/spellbook/issues/2187#issuecomment-1551000439, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADKOP2SADANSOM523LOOHLXGSGFPANCNFSM6AAAAAASQ4SOV4 . You are receiving this because you authored the thread.Message ID: @.***>
I opened a new PR for it: https://github.com/duneanalytics/spellbook/pull/3385. I also create a basic style guide, mostly copied frim gitlab style guide. Let me know if I am missing something.
linting is a backlog item that dune team will get to when other priorities clear up
I propose we lint the repo using sqlfluff.
To ensure consistency, I also propose that we add a github action to ensure future merged commits are also linted.
I'm asking for this because I have been modifying a number of models, etc to support BigQuery dialect, and the inconsistency of file formatting makes it harder than it would otherwise be.
This will be a major change and I don't have the ability to ensure all of the linted files pass tests - so I will need some help from the core team to ensure the PR performs correctly.
@soispoke what's the right way to go about this?