Closed luanmorenomaciel closed 10 months ago
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.
Is this a new bug in dbt-spark?
Current Behavior
Following the instructions of getting started and trying to read and write to Parquet doesn't work properly. The output metadata is being written on storage however the data is not.
Expected Behavior
Read files and write them into Parquet in another folder using the dbt-spark framework.
Steps To Reproduce
1 - Stand up the docker-compose available on https://github.com/dbt-labs/dbt-spark
docker-compose up -d
2 - Install packages dependencies using packages.yml `packages:
3 - Add connectivity to the profile.yml with the following configuration
spark: target: dev outputs: dev: type: spark method: thrift host: localhost port: 10000 schema: default
4 - Add the external config according to the https://github.com/dbt-labs/dbt-external-tables/blob/main/sample_sources/spark.yml `version: 2
sources:
5 - Set the dbt_project.yml file and build a simple SQL statement that would be persisting data `name: 'spark' version: '1.0.0' config-version: 2
profile: 'spark'
model-paths: ["models"]
target-path: "target" clean-targets:
models: +materialized: table +file_format: parquet`
SELECT * FROM {{ source('bronze', 'users') }}
6 - Once I execute the command below, I get the success message; hence, the metadata shows up on the HMS server.
dbt -d run-operation stage_external_sources --vars "ext_full_refresh: true" --profiles-dir /Users/luanmorenomaciel/GitHub/owshq-dbt-core/dbt/
7 - Executing the command, I’ve got the green status too
dbt -d run-operation stage_external_sources --vars "ext_full_refresh: true" --profiles-dir /Users/luanmorenomaciel/GitHub/owshq-dbt-core/dbt/
Relevant log output
Environment
Additional Context
I've been trying this for a week now and it seems not to be working properly. I need somebody to help me if possible!