Open jhchee opened 1 year ago
@jhchee Spark sql parser doesn't supports this so not sure if we can do anything on our end. All configs comes into play during the execution of sql.
you can do ALTER table first and add column before calling the merge.
@ad1happy2go
Not sure if this is really something blocked by spark sql parser, as an example Delta Lake supports schema evolution in MERGE INTO (both for partial updates as well as for update and insert ):
https://docs.delta.io/latest/delta-update.html#-merge-schema-evolution
Would be great to have something similar in Hudi. Currently, Hudi tries to use target table schema during MERGE INTO (and drops incoming columns if schema is wider for example).
@kazdy @jhchee You are correct, this should be supported for MERGE INTO.. I confirmed master also doesn't support it. Attaching the same code which should work.
create table test_insert3 (
id int,
name string,
updated_at timestamp
) using hudi
options (
type = 'cow',
primaryKey = 'id',
preCombineField = 'updated_at'
) location 'file:///tmp/test_insert3';
merge into test_insert3 as target
using (
select 1 as id, 'c' as name, 1 as new_col, current_timestamp as updated_at
union select 1 as id,'d' as name, 1 as new_col, current_timestamp as updated_at
union select 1 as id,'e' as name, 1 as new_col, current_timestamp as updated_at
) source
on target.id = source.id
when matched then update set target.new_col = source.new_col
when not matched then insert *;
Create JIRA to track - https://issues.apache.org/jira/browse/HUDI-6483
Feel free to contribute.
Describe the problem you faced I have created a table with 2 columns namely
userId
andupdatedAt
. Now I'm passing new columnnested
in themerge into
command but gotten an exception.Error
To Reproduce
Steps to reproduce the behavior:
.config("hoodie.schema.on.read.enable", "true")
doesn't help.Expected behavior The schema should evolve and detect that this is a new column.
Environment Description
Hudi version : 0.12.2
Spark version : 3.3.1
Hive version : -
Hadoop version : -
Storage (HDFS/S3/GCS..) : -
Running on Docker? (yes/no) : -
Additional context
Add any other context about the problem here.
Stacktrace
Add the stacktrace of the error.