The auto-generated sql-query for the upsert step does not carry over the size of a CHAR column. It seems to only work with VARCHAR columns. Example:
My Table definitions for the upsert step looks like:
CREATE TABLE IF NOT EXISTS staging.example (col1 VARCHAR(20), col2_buggy CHAR(2));
CREATE TABLE IF NOT EXISTS public.example (col1 VARCHAR(20), col2_buggy CHAR(2));
And the auto-generated sql-query (which is fortunately printed to stdout) for the upsert will look like :
CREATE TEMPORARY TABLE example_temp (col1 VARCHAR(20), col2_buggy CHAR);
INSERT INTO user_activity_temp (SELECT * FROM staging.user_activity LIMIT 10);
and so on...
col2 in the temporary table is thus generated with the default size (1) and carryover fails. Pipeline will fail giving me:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/dataduct/steps/executors/runner.py", line 100, in sql_runner
cursor.execute(sql_query)
psycopg2.InternalError: Value too long for character type
DETAIL:
-----------------------------------------------
error: Value too long for character type
code: 8001
context: Value too long for type character(1)
query: 8143
location: funcs_string.hpp:392
process: query0_25 [pid=31361]
Everything works fine if i change col2 ro VARCHAR as well.
Used: dataduct 0.4.0
The auto-generated sql-query for the
upsert
step does not carry over the size of a CHAR column. It seems to only work with VARCHAR columns. Example: My Table definitions for theupsert
step looks like:And the auto-generated sql-query (which is fortunately printed to
stdout
) for the upsert will look like :col2 in the temporary table is thus generated with the default size (1) and carryover fails. Pipeline will fail giving me:
Everything works fine if i change col2 ro VARCHAR as well.
EDIT: traced the problem here https://github.com/coursera/dataduct/blob/develop/dataduct/database/parsers/utils.py#L26-L28
CHAR
pattern definition does not includeWord(alphanums)