Open vakarisbk opened 1 year ago
Thanks for reaching out @vakarisbk ! And thanks for providing such a well-constructed example 🏆
Could you try this?
dbt build -s +test_null --full-refresh
dbt show --inline "select * from {{ ref('test_null') }} order by type" --limit 10
This worked for me, and here's the output that I got:
16:35:50 Running with dbt=1.6.5
16:35:50 Registered adapter: spark=1.6.2
16:35:50 Unable to do partial parsing because a project config has changed
16:35:51 Found 1 model, 1 seed, 0 sources, 0 exposures, 0 metrics, 398 macros, 0 groups, 0 semantic models
16:35:51
16:35:55 Concurrency: 1 threads (target='http')
16:35:55
16:35:55 1 of 2 START seed file dbt_dbeatty.seed_null ................................... [RUN]
16:36:03 1 of 2 OK loaded seed file dbt_dbeatty.seed_null ............................... [CREATE 7 in 7.76s]
16:36:03 2 of 2 START sql view model dbt_dbeatty.test_null .............................. [RUN]
16:36:05 2 of 2 OK created sql view model dbt_dbeatty.test_null ......................... [OK in 2.42s]
16:36:06
16:36:06 Finished running 1 seed, 1 view model in 0 hours 0 minutes and 15.45 seconds (15.45s).
16:36:06
16:36:06 Completed successfully
16:36:06
16:36:06 Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
16:36:08 Running with dbt=1.6.5
16:36:08 Registered adapter: spark=1.6.2
16:36:08 Found 1 model, 1 seed, 0 sources, 0 exposures, 0 metrics, 398 macros, 0 groups, 0 semantic models
16:36:08
16:36:11 Concurrency: 1 threads (target='http')
16:36:11
16:36:13 Previewing inline node:
| type | value1 | value1_type | value2 | value2_type |
| ------------ | ------ | ------------- | ------ | ------------- |
| doublequotes | | Actual NULL | | Actual NULL |
| empty | | Actual NULL | | Actual NULL |
| nothing | | Actual NULL | | Actual NULL |
| spaces | | Actual NULL | | Actual NULL |
| stringnone | None | String "None" | None | String "None" |
| stringnull | | Actual NULL | | Actual NULL |
| ticks | '' | Other | '' | Other |
Since this looks like it's working, I'm going to close this as "can't reproduce". But please let me know if you still think this is an issue and we can consider re-opening this.
@dbeatty10 thanks for your response.
I can see from your logs that you're using target=http
which I assume may be connection to a Databricks cluster. This issue only appears in the session
connection (local pyspark).
I just re-ran the tests and got my originally reported results back.
Thank you for that insight @vakarisbk -- will re-open this!
Nice eyes on http
vs. session
🦅
Using the example here, the following commands do give the expected output when using the ✅ http
connection method, but give unexpected output when using the ❌ session
method:
dbt build -s +test_null --full-refresh
dbt show --inline "select * from {{ ref('test_null') }} order by type" --limit 10
null
within the database.session
connection method is used.We are unable to prioritize this ourselves, so I'm going to lable this as "help wanted".
Is this a new bug in dbt-spark?
Current Behavior
Null values in csv seeds appear as string "None" or empty string in the materialized table when using Spark Session connection or the Spark Connect method in development on #899
Expected Behavior
Null values in seeds should be materialized as null values.
Steps To Reproduce
seed_null.csv
:test_null.sql
:Relevant log output
No response
Environment
Additional Context
No response