apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.48k stars 1.01k forks source link

Support casting from UTF8 --> FixedSizeBinary, Binary --> FixedSizedBinary #5530

Open alamb opened 1 year ago

alamb commented 1 year ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do. A user reported on discord that they were trying to use DataFusion to query FixedSizeBinary fields stored in parquet?

Message link: https://discord.com/channels/885562378132000778/885562378132000781/1083347011497119775

There is probably a whole project to fully support FixedSizeBinary fields, but initially at least it would be nice to be able to cast them to types that DataFusion has better support for (like Binary)

Describe the solution you'd like

@tustvold filed https://github.com/apache/arrow-rs/issues/3826 to track adding conversion between binary <-> fixed size binary

Once that is available, I expect this to work

❯ select arrow_cast('002920a3044b3c9f56e797b8'::bytea, 'FixedSizeBinary(12)');
Error during planning: Cannot automatically convert Binary to FixedSizeBinary(12)

Once there is direct support to cast between Utf8 and FixedSizeBinary, I would also expect this to work

 select arrow_cast('002920a3044b3c9f56e797b8', 'FixedSizeBinary(12)');
Error during planning: Cannot automatically convert Utf8 to FixedSizeBinary(12)

Describe alternatives you've considered

Additional context

alamb commented 1 year ago

See note from tustvold on https://github.com/apache/arrow-rs/issues/3826#issuecomment-1461991619 -- casting from Utf8 to FixedSizeBinary may be more confusing than helpful, which is something to consider