dbt-labs / dbt-spark

dbt-spark contains all of the code enabling dbt to work with Apache Spark and Databricks
https://getdbt.com
Apache License 2.0
400 stars 227 forks source link

[ADAP-518] Convert information to a `dict` #751

Open Fokko opened 1 year ago

Fokko commented 1 year ago

Is this your first time submitting a feature request?

Describe the feature

In preparation for having three-part identifiers catalog.schema.table (https://github.com/dbt-labs/dbt-spark/issues/755), I would like to change the information attribute on the SparkRelation into a dict:

https://github.com/dbt-labs/dbt-spark/blob/cb41ab049481bc458871d5c37fad47e59d6b759c/dbt/adapters/spark/relation.py#L36-L37

Describe alternatives you've considered

The current way is unmaintainable with the regex that extracts useful information from the big blob of text. Also, I noticed that the types are missing currently:

image

Who will this benefit?

Mostly the developers because it is hard to maintain right now, and hard to extend the current situation

Are you interested in contributing this feature?

Yes!

Anything else?

I wanted to add the database to the configuration. In Spark, this is called a catalog: https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.Catalog.html

Since Spark 3.0 it can discover tables/views from multiple catalogs, such as a Hive or Glue catalog. I would love to add this, but this refactor needs to be done first and I also want to keep the PRs concise.

dbeatty10 commented 1 year ago

Thanks for kicking this off @Fokko 🏆

I started a tasklist to track each of the refactor(s) + feature implementation(s) needed for three-part identifiers:

As you create more issues for this, just let me know and we'll add them to that tasklist.

Fokko commented 1 year ago

@dbeatty10 Thanks! Much appreciated

github-actions[bot] commented 1 year ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions[bot] commented 11 months ago

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

dbeatty10 commented 6 months ago

@Fokko I just noticed that this and https://github.com/dbt-labs/dbt-spark/issues/755 were closed as stale, so I'm re-opening them now.