Closed tglunde closed 2 years ago
This behaviour is dictated by Exasol Websocket protocol. It uses basic JSON for data transport, and it is not possible to represent big integers and decimals with fractions part in basic JSON without using strings.
Pyexasol does not transform any data by default and returns everything "as is".
However, you may change it by passing a custom function to fetch_mapper
connection option.
You may use this mapper or create your own.
Example: https://github.com/exasol/pyexasol/blob/master/examples/a04_fetch_mapper.py
Please note, some transformations are expensive and will noticeably reduce fetching performance, which might be a problem for large data sets.
Thanks for the answer. Since this is something out of the scope of pyexasol and I agree that adding this mapper to every fetch is probably an unnecessary overhead in the dbt-exasol adapter, I would rather try to overwrite the above-mentioned logic in SQLAdapter - where only for these specific cases (big integers and decimals with fraction) I would replace isinstance with using schema information to determine whether a cell is of type string. But one more question. Can you confirm that BIGINT and DECIMALs with fraction are the only datatypes that the Exasol Websocket Protocol is converting into strings? Just to make sure - do you have a link to a resource that describes the problem with Exasol Websocket protocol and "basic JSON" to use strings? Thx, Torsten
@tglunde Here is the specification of the websockets-api . However, I am not sure, if it states, how the types get encoded, but my guess, everything that can't be represented by JSON without loss, will be a string.
@tglunde I found actually a description of what values you can expect in the JSON.
<string | number | true | false | null>
And a number only has a limited precision in JSON, so if the Column needs more precision, the value will be a string. And, types like HASHTYPE or GEOMETRY are then also strings.
https://github.com/exasol/websocket-api/blob/master/docs/commands/fetchV1.md
Hi. I have implemented the workaround in dbt-exasol adapter, overwriting the get_result_from_cursor method, where I convert strings that have decimal or bigint into Decimal within python. Since this is only used for test results I am sure there will not be any use case with HASHTYPE or GEOMETRY coming up soon. (https://github.com/tglunde/dbt-exasol/pull/33/files# - in connections.py) Therefore we can close this issue here.
Implementing dbt-exasol I am following dbt-labs test suite to verify functionality of the dbt-exasol adapter.
In this test suite and also regarding a previous issue equality test dbt-utils, we have noticed a behaviour of the pyexasol, that I am not sure whether this is by design or accident.
When testing two relations to be equal, the adapter test is checking row counts and bi-directional EXCEPT query to realize the test:
Executing the test using test tables outputs the BIGINT column from the row_count_diff.difference CTE as follows:
In dbt standard adapters (SQLAdapter as base class),
isinstance(row[0])
is being used to figure that first column value is a string, because pyexasol delivers BIGINT and decimal with precisions > 18 in quotes / as strings. These standard equality tests then fail, because assertion on '0' failed because expected difference in row count is 0.To then test the pyexasol with different datatypes, I created the following test python script to confirm behavior of pyexasol:
Using a exasol 7.1.11 on my local machine, I am getting the following output for the print(row) statement:
This shows that INTEGER, DECIMAL(18,0) and FLOAT are represented as numbers - which leads to correct interpretations
isinstance
python method. But the DECIMAL(6,3), DECIMAL(19,0) and BIGINT numbers are being returned as strings and asserts will fail usingisinstance
.My question is why is pyexasol showing this behaviour - is it by design? The standard SQL adapter implementation from dbt-labs is assuming that
isinstance
is delivering correct datatypes to detect text columns. This is why with pyexasol BIGINT and some DECIMAL datatypes are being treated as strings by default.One solution would be to change the default SQLAdapter's behaviour to actually use schema information delivered by pyexasol instead of using the isinstance method. But I wanted to understand why pyexasol seems to be behaving differently than Postgres, Snowflake and Redhisft.
Regards, Torsten