Open osawyerr opened 1 year ago
Thanks for the report @osawyerr
TLDR is that DataFusion treats VARCHAR
and CHAR
the same (because they use the same underlying Arrow type)
🤔 I am not quite sure what to do here. Postgres has the notion of a max width character column CHAR
but arrow does not . DataFusion maps the SQL type CHAR
--> arrow type Utf8
Thus I am no sure we were be able to replicate the behavior of postgres in this instance.
For anyone following along, here is what postgres does
postgres=# create table example(name varchar(20));
CREATE TABLE
postgres=# insert into example values ('foo'), ('barrr');
INSERT 0 2
postgres=# select octet_length(name) from example;
octet_length
--------------
3
5
(2 rows)
postgres=# select octet_length(name::char(25)) from example;
octet_length
--------------
25
25
(2 rows)
postgres=# select octet_length(name::varchar(25)) from example;
octet_length
--------------
3
5
(2 rows)
postgres=#
@alamb There is also a related point RE how char
s are handled in joins and equality in postgres vs. varchar
as well. In postgres if I recall, the trailing spaces are ignored when comparing char
s but are taken into account when comparing varchar
s.
From your explanation, DataFusion will always take into account the trailing spaces if a "char
" is used in a join / equality. i.e. it will behave like a postgres varchar
.
i.e.
-- note the trailing spaces below
select * from example where name::char(25) = 'foo '::char(25)
Will return no results in DataFusion but in postgres the result will be returned.
Describe the bug
Hi there,
select octet_length(char)
doesn't behave as expected.In the example below - in Postgres (
text
column) the below SQL would return 25 as the result however in DataFusion (StringArray
column ) it just returns the original length of the values inn_name
.I'm not sure if this is because the cast to a
char(25)
is not behaving as expectedTo Reproduce
create a StringColumn in DF, populate it with some values and run the following SQL on it.
Expected behavior
Should return 25, but it returns the length of each of the values in the column.
Additional context
No response