Closed abhijeetvg closed 2 months ago
Can you provide more details here?
Depending on the above answers, can we disable pushdown of fields with this type into the source engine to allow correct results even if it means reduced performance / overscan?
This is the case of postgres as well.
I observed though that if you add padding to the length of the column, queries with these where clauses work.
following works
select * from <some_test_table> where test_char = 'char2 ';
following does not work:
select * from <some_test_table> where test_char = 'char2';
I think this is a specific to Postgres / Redshift because we treat trailing whitespaces of Char columns different than for example Athena. For these systems trailing whitespace is always significant.
There are two ways to deal with this problem:
Probably it's important to keep in mind that trailing spaces might occur as well in VARCHAR columns.
Greetings from 2023, I am fairly certain this bug is still present in the latest Athena Connector. Please let me know if I need to cut a different issue
My Reproduction Steps:
1) Create a new redshift database, check the box to allow it to load the sample data. 2) Setup the Athena connector to connect to Redshift's sample_data_dev database.
3) Run the following query in Redshift:
SELECT count(*) as count, i_category
FROM tpcds.catalog_sales JOIN tpcds.item ON i_item_sk=cs_item_sk
JOIN tpcds.date_dim ON d_date_sk=cs_sold_date_sk
LEFT JOIN tpcds.catalog_returns ON (cs_order_number=cr_order_number
AND cs_item_sk=cr_item_sk)
WHERE i_category = 'Men'
GROUP BY i_category
order by 1
You'll get back 141762 results.
Run the same query above in Athena using the connector, and you'll get back zero results. The reason is because i_category is padding 47 extra spaces onto the value and Athena is not trimming them out. A query that does work is...
SELECT count(*) as count, i_category
FROM tpcds.catalog_sales JOIN tpcds.item ON i_item_sk=cs_item_sk
JOIN tpcds.date_dim ON d_date_sk=cs_sold_date_sk
LEFT JOIN tpcds.catalog_returns ON (cs_order_number=cr_order_number
AND cs_item_sk=cr_item_sk)
WHERE i_category = 'Men '
GROUP BY i_category
order by 1
or
SELECT count(*) as count, i_category
FROM tpcds.catalog_sales JOIN tpcds.item ON i_item_sk=cs_item_sk
JOIN tpcds.date_dim ON d_date_sk=cs_sold_date_sk
LEFT JOIN tpcds.catalog_returns ON (cs_order_number=cr_order_number
AND cs_item_sk=cr_item_sk)
WHERE i_category LIKE 'Men%'
GROUP BY i_category
order by 1
However neither option should have to be used. Athena should be trimming the whitespace natively in order to match Redshift's behavior.
This has been address in the latest release.
Execution with constraint on char column returns empty results.
Verified that correct value is set in the prepared statement.
Similar push down works in MySql.