Closed boristyukin closed 4 years ago
How difficult it would be to implement? I know Python to be dangerous but I am not a daily Python developer, only use it occasionally.
Hey @boristyukin - have you seen the docs on building a new adapter? If you're looking for inspiration, you can find some similar adapter plugins here:
There are broadly two things you need to do when building a new adapter:
I definitely recommend checking out the links above - they should give you a good feel for what's involved here! Please don't hesitate to let me know if you have any questions :)
thanks @drewbanin! looking now...
Hi, @drewbanin I work with @boristyukin and have been looking at dbt for a bit with respect to perhaps implementing an adapter for Impala. I've reviewed the doc and looked at some of the source and have a fundamental question. Given the current design and implementation of dbt, is schema a required attribute for dbt supported databases? (e.g. "schema.table_name")
The folks who designed Impala opted not to implement it this way, and instead only support database.table_name. There's no notion of schema in the sense that you might have with postgres, for instance.
Before I head down a rabbit hole, I thought I'd check with you to see if this would be possible. I tried hacking up some macros for Impala that basically ignored schema in favor of relation.identifer and while a basic "select xxxx from yyyy" type model works fine, anything more advanced than that starts throwing errors due to schema being expected.
Any thoughts?
Thanks!
Hey @ghaskell44! That's really cool, I think Impala is a great target for a dbt database plugin. dbt does generally assume that the databases it works with will have a notion of a database + schema, but I think there's a way to work around that.
We built a plugin for SparkSQL which also does not have a proper notion of "schemas". On Spark, schema
is just an alias for database
.
We worked around this by making both the database
and schema
properties required in the Credentials contract, but using some clever logic to use the supplied schema
value as the database
(if a database
config was not provided). The solution on Impala might look a little different, but you should just be able to supply a phony value for the schema
I think.
You may also want to set the include_policy for the schema
to False. This should cause dbt to render out Relations with <database>.<identifier>
instead of <database>.<schema>.<identifier>
.
In general, feel free to peruse the Spark plugin and let me know if you have any questions! I think it should account for many of the implementation challenges that you'll see on Impala.
Thanks, @drewbanin! That worked great. I went ahead and set database to False in the include_policy just like Spark since Impala treats database and schema the same, then just used schema in the macros. I think the main thing I was missing was the include_policy but I also added the "clever logic" bit and removed dbname from the profile. My simple models that were failing before are now working correctly.
Thanks again for the pointers. If I get something working that looks full-featured, I'll put it up on GitHub.
closing this one - out of scope for core
@ghaskell44 were you able to get something working? If so, would love to link out to it in the documentation!
we had some roadblocks unfortunately and went a different route with a different tool that already supports Impala. sorry
Hi @boristyukin , Impala user here too, and interested in adopting dbt. May I ask what tool you ended up going for?
Hi @boristyukin , Impala user here too, and interested in adopting dbt. May I ask what tool you ended up going for?
hey @ynouri, we ended up building custom processor in NiFi, that can pick up SELECT statements from files and persist them into tables. Worked quite well for our needs. We could not make Impala work with dbt unfortunately
Hi @boristyukin , yet another Impala user here too, we are considering writing an adapter for Impala for DBT as we are looking to adopt dbt. I was wondering if you recall what were challenges your team hit with it. Thanks!
Hi @boristyukin , yet another Impala user here too, we are considering writing an adapter for Impala for DBT as we are looking to adopt dbt. I was wondering if you recall what were challenges your team hit with it. Thanks!
sorry @tozka I do not remember exactly the challenges we had, but after spending a week or so we gave up because we also had NiFi and we built something custom that worked great for us. I need to say we are not daily Python developers so you might have better luck. We built custom NiFi processors that would pick up queries defined as SELECT statements in files and persist them into tables. Obviously it was not as future rich as dbt but got the job done :) and we added some impala specific steps like optional rebuild of stats, fast switching of production tables using LOAD INLINE and etc.
Hi, We now have a working version of dbt-imapla adapter at: https://github.com/cloudera/dbt-impala Please try out and let us know your feedback.
Support Apache Impala. Apache Impala is a widely distributed engine, used by thousands of enterprise around the world.