Paimon uses .toString to generate partition value, which is not accurate for some data types. like date/binary. Say, Spark engine would use a Cast to convert a partition object to string value. So this pr changes to use cast to generate partition value.
Add a new config partition.legacy-name to support switch to use previous toString behavior, and by default use the legacy behavior(.toString).
An example that using binary type partition column would cause failure.
CREATE TABLE pt (
id BIGINT,
c1 STRING
) using paimon
PARTITIONED BY (day binary);
insert into table pt values(1, 'a', cast('2021' as binary));
select * from pt;
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1) (192.168.0.102 executor driver): java.io.FileNotFoundException: File 'warehouse/default.db/pt/day=%5BB@4a045a11/bucket-0/data-91c064a3-a0a1-4042-9d5a-cc82a23af7ff-0.parquet' not found, Possible causes: 1.snapshot expires too fast, you can configure 'snapshot.time-retained' option with a larger value. 2.consumption is too slow, you can improve the performance of consumption (For example, increasing parallelism).
Purpose
Paimon uses
.toString
to generate partition value, which is not accurate for some data types. like date/binary. Say, Spark engine would use aCast
to convert a partition object to string value. So this pr changes to use cast to generate partition value.Add a new config
partition.legacy-name
to support switch to use previoustoString
behavior, and by default use the legacy behavior(.toString).An example that using binary type partition column would cause failure.
Tests
add test
API and Format
no
Documentation
added docs