dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.95k stars 1.63k forks source link

Allow .sql.jinja file extension. #3484

Open Luttik opened 3 years ago

Luttik commented 3 years ago

Describe the feature

Provide the option to allow *.sql.jinja files as alternative to *.sql in all places where DBT uses jinja in SQL (I believe that is all places excluding the target folder).

This allows users to easily configure their IDE to handle both plain sql files as well as dbt sql(.jinja) files as intended. Currently (at least with pycharm) I have to choose between having all sql files interpreted as jinja or as plain sql.

Describe alternatives you've considered

I've looked into options to only apply jinja interpretation to the dbt/ part of my project via my IDE but that is not possible. Not only that I'd prefer a configuration that would have the IDE interpret files properly by default, or at least irregardless of project-specific configuration. This one simple change would ensure that.

Who will this benefit?

Everybody 😉

Are you interested in contributing this feature?

Sure

jtcohen6 commented 3 years ago

@Luttik Thanks for opening!

I take the heart of this question to be, How can we offer better syntax highlighting for dbt files that are a mix of Jinja + SQL? And maybe someday, fuller intellisense capabilities in a variety of text editors?

The biggest first step is to build a SQL file interpreter / syntax highlighter than can recognize Jinja as well. That's something we built into the atom-dbt plugin, though it's not something we've worked on for a while:

Screen Shot 2021-06-22 at 1 16 39 PM

That feels like the best resolution for the either-or dichotomy you're facing today:

Currently (at least with pycharm) I have to choose between having all sql files interpreted as jinja or as plain sql.

There are a few threads discussing how to do this, or something like it, in PyCharm today:

In a world where we had that advanced Jinja-SQL interpreter, I'm less convinced that we'd need a way to distinguish between "dbt SQL" files (.sql.jinja) and "non-dbt SQL" files (plain .sql). As a templating language, Jinja uses clear indicators ({% %} {{ }}) that really don't come up in standard SQL. Practically, is there a reason why you wouldn't want your other .sql files to have Jinja interpretation available, knowing it's likely unused outside of dbt?

Luttik commented 3 years ago

@jtcohen6 It goes a bit further than code highlighting and auto-completion (Pycharm actually does a pretty good job of that for SQL-based jinja). The thing is that DBT uses jinja files that render to SQL. Since they are not actually SQL they, for instance, cannot be executed against a DB.

  1. So just to reiterate DBT doesn't use SQL files directly they use Jinja files that render to SQL. 2. we do have decent jinja-SQL interpreters (I admit they are not ás good as pure SQL interpreters, but the IDE does perfectly well if you mark the DBT files as jinja). So due to point 1 it would be more correct to have files be called .sql.jinja and due to point 2 we would get best of both worlds if we do so.

It might also be good to reiterate that I do not want to break current behavior. But to provide the option to distinguish the file (or I'd argue, give them the correct and more verbose extension).

Also, note that I've been active in the JetBrains DBT thread for months. I consider these to be separate issues.

Luttik commented 3 years ago

(close was a misclick)

jtcohen6 commented 3 years ago

Thanks @Luttik, I really appreciate the clarifications. I just reread your comments in the JetBrains thread and found myself agreeing. (Thanks for being a driver in that conversation!) Some of it gets at nuances of PyCharm that I didn't previously appreciate, such as treating .jinja files as HTML vs. .sql files as executable against a database connection.

So just to reiterate DBT doesn't use SQL files directly they use Jinja files that render to SQL.

The third time you said this, it finally clicked for me. You're right :)

One other piece of information that I find persuasive: the Jinja maintainers, who previously did not endorse an official jinja file extension—preferring to just include Jinja in .html, .md, .xml, whatever—have since thrown their weight behind .jinja (docs, StackOverflow discussion):

Adding a .jinja extension, like user.html.jinja may make it easier for some IDEs or editor plugins, but is not required.

The only reason we shouldn't do this is if we had clear plans to stop using Jinja. I don't see that as a likelihood. We may reimplement certain functionality for performance reasons (see: experimental parser), but I think we're keeping Jinja syntax and most parts of the templating engine for good.

Some IDEs, such as the dbt Cloud Developer interface, may choose not to add support for .sql.jinja files. That's ok; as you say, this is not breaking behavior, only additive, and really intended for those IDEs which would benefit from the extra file extension.

I think the code change required here would actually be quite minimal, since that we recently consolidated file reading:

https://github.com/fishtown-analytics/dbt/blob/c794600242e734d870d4bf1c8fb4c0349ab961eb/core/dbt/parser/read_files.py#L103-L121

There's logic further down to handle .yml + .yaml. Perhaps read_files_for_parser could be rewritten to take a list for its extension argument, rather than needing to repeat the same code for .sql and .sql.jinja in each.

I'm going to mark this a good first issue, and welcome a PR for it.

Luttik commented 3 years ago

As noted in the pull-request this change might be a bit harder than anticipated. So If you'd like to see this feature I'd love your help.

Gregory108 commented 3 years ago

As @jtcohen6 suggested, I say here, that I also would find this feature quite useful! As I have not found this request, my similar, now closed, request asked for support not only of .sql.jinja, but also .sql.j2, because it is an autosave format in VSCode. I tried to understand, but did not found any difference between .jinja, .jinja2, .j2 extensions. So, I hope adding all of them should not become 3 separate problems.

Luttik commented 3 years ago

Yeah this should be quite an easy addition if I get the pull-request to work. To do this we'll have to unravel the extremely long method dbt.parser.manifest.ManifestLoader.load (at least I believe that is where issues reside) to see where in the process something goes wrong... IIRC it was a methods of ~140 lines which stitches many other routines together. I haven't had time to look at it since my last message.

Luttik commented 2 years ago

@jtcohen6 maybe replace good_first_issue with help_needed or something similar, since we've found that this is actually a hard problem to tackle given the dbt project structure.

jtcohen6 commented 2 years ago

@Luttik You're totally right. I'm going to start using help_needed in place of good_first_issue when something:

github-actions[bot] commented 2 years ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

github-actions[bot] commented 2 years ago

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest; add a comment to notify the maintainers.

jtcohen6 commented 2 years ago

Going to reopen this one, given renewed interest

github-actions[bot] commented 4 months ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

thatguysimon commented 4 months ago

Joining late to the party but I recently found myself really needing this as well, so hoping to bump this up a bit.