Open londrake opened 3 months ago
I can confirm that this issue is valid. This is why it can can happen:
source.new_col
while which of the target table is target.id
.target.new_col
against the target plan. It's obvious that no column can match.target.new_col
will not match anything because the only existing column is source.new_col
.The fix however is not trivial and risky. We could change the match logic to see the first part of the column name as an alias, but this will fail in the following example:
Source schema: col1 int, col2 int, t struct<col1: int, col2 int>
Target schema: col3 int, col4 int
Query: source.alias('s').merge(target.alias('t').update(Map('t.col2' -> 's.col2'))
What will 't.col2' match? the col2
column in the source table or the nested field t.col2
?
I'll do some research to see how people run this kind of query and decide the next step.
Query: source.alias('s').merge(target.alias('t').update(Map('t.col2' -> 's.col2'))
I think you meant
Query: target.alias('t').merge(source.alias('s').update(Map('t.col2' -> 's.col2'))
Well, in this case, there is a clash btw the alias for the delta table and the struct column name... making sure that the alias does not match the column name will probably solve the issue, but it sounds like a workaround.
Bug [Spark] issue when merge using autoMerge property
Which Delta project/connector is this regarding?
Describe the problem
Merge operation with an insert/update Expr condition doesn't support using an alias when referencing the target table. and the conf
"spark.databricks.delta.schema.autoMerge.enabled"
is enabled. Aliasing work fine when the parameter is off.Let's suppose the target table has the alias t and the source has the alias s, when we define the Expr condition like that
Supposing new_col is an additional column for the target table, giving such a map
Map( "t.new_col"- > "s.new_col")
raises the error.Steps to reproduce
Run the code below. Update the variable into the updateExpr/insertExpr to reproduce the issue. ` // spark.conf.set("spark.databricks.delta.schema.autoMerge.enabled","true") <- this cannot be enabled at run time. it has to be set when the spark session is initializing.
`
Observed results
When
val goodColumnsMap = Map("new_col" -> "source.new_col")
is given as updateExpr/insertExpr condition, the merge runs smoothly as expected.When
val badColumnsMap = Map("target.new_col" -> "source.new_col")
is given as updateExpr/insertExpr condition, an error will be raised.Merge Op
ERROR LOG
Expected results
I do expect there is no different behavior between the 2 cases. So the merge should run smooth.
Further details
Environment information
Willingness to contribute
The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?