databricks / dbt-databricks

A dbt adapter for Databricks.
https://databricks.com
Apache License 2.0
195 stars 104 forks source link

[TABLE_OR_VIEW_ALREADY_EXISTS] when running create or replace views/tables in parallel #694

Open claudiazi opened 3 weeks ago

claudiazi commented 3 weeks ago

Describe the bug

When I'm running dbt run -s model for different models in parallel (and they exist before these dbt commands run) on databricks general compute cluster. Im getting the error: TABLE_OR_VIEW_ALREADY_EXISTS. However, after getting this error, if it is a table, then the table still exists; if it is a view, the view doesnt exist anymore.

Steps To Reproduce

Screenshot 2024-06-05 at 19 50 14

Does anybody face the same situation? Thank you very much!

benc-db commented 3 weeks ago

Are you running from a single local directory against the cluster? As part of compilation, dbt writes files to your target folder, and I'm wondering if running multiple tasks in parallel is overwriting the results of compilation, like the manifest.json.

claudiazi commented 3 weeks ago

@benc-db This also happens when they run in different containers as individual batch jobs separately.

benc-db commented 3 weeks ago

Interesting, that is surprising. I'm fairly sure this should work, as I believe people do this sort of thing with airflow, so I'm surprised this wouldn't have come up sooner. Could you share dbt.log files from a simple repro by email to ben.cassell@databricks.com?