Closed aaronraff closed 3 years ago
Thanks for this reproduction case @aaronraff! I have an even simpler one:
select '0000000025'::varchar as my_varchar_column
base64 encoded: c2VsZWN0ICcwMDAwMDAwMDI1Jzo6dmFyY2hhciBhcyBteV92YXJjaGFyX2NvbHVtbg==
"table": {
"column_names": [
"my_varchar_column"
],
"rows": [
[
25.0
]
]
}
I think the issue here is with how the RPC server handles data types implicitly, rather than storing them explicitly alongside JSON results.
This is definitely an issue with agate. I dived a bit deeper into the example from above. The method that switches this from a padded string to an integer is table_from_data_flat
, called by execute
→ get_result_from_cursor
:
Here's the simple reproduction case:
>>> import dbt.clients.agate_helper
>>> data = [{'my_varchar_column': '0000000025'}]
>>> column_names = ['my_varchar_column']
>>> agate_tbl = dbt.clients.agate_helper.table_from_data_flat(
... data,
... column_names
... )
>>> agate_tbl.print_table()
| my_varchar_column |
| ----------------- |
| 25 |
Agate does a lot of type inference under the hood. We enable user-supplied column type overrides for seeds, but I don't think that makes a lot of sense for one-off RPC queries. Really, we should be getting the data type from the database, though that may mean handling some low-level differences across cursors. Here's what cursor.description
looks like for Postgres + Redshift:
(Column(name='my_varchar_column', type_code=1043),)
Versus Snowflake:
[('MY_VARCHAR_COLUMN', 2, None, 16777216, None, None, False)]
Whereas other databases, e.g. BigQuery, reimplement adapter.execute
and use other methods to convert fetched results to an agate table. So the intervention needed may vary.
Describe the bug
While using the RPC server, some values are returned as the wrong type. It looks like strings are being cast to floats somehow.
Steps To Reproduce
dbt rpc
)The SQL here is the base64 encoded version of:
Copy the
request_token
from the responseMake another request with the following body:
Inspect the rows that are returned. You should see something like this:
If you run the same query in psql, you will get the following output:
Notice how
pad_int_varchar
is padded since it is represented as a varchar, and not an integerExpected behavior
I would expect that all of the rows returned would be strings (since they were casted), and not floats. As seen in step 5, the second to last row is not padded since it is being represented as a float.
Screenshots and log output
I don't have any other relevant logs or output, but I'm happy to add more of the RPC response to this issue if that is helpful!
System information
Which database are you using dbt with?
The output of
dbt --version
:The operating system you're using:
macOS Catalina Version 10.15.7
The output of
python --version
:Python 3.7.7
Additional context
N/A