Open tustvold opened 2 years ago
I tried changing this in apache/arrow-datafusion#2428 but it leads distributed_join_plan to fail with
Error: DataFusionError(Plan("The left or right side of the join does not have all columns on \"on\": \nMissing on the left: {Column { name: \"l_orderkey\", index: 0 }}\nMissing on the right: {Column { name: \"o_orderkey\", index: 0 }}"))
I'm not familiar enough with this code to know what is going on here, but something doesn't feel right
Describe the bug
ShuffleWriterExec::schema()
returns the schema of the underlying plan, however,ShuffleWriterExec::execute
returns a stream of RecordBatch containing metadata and a consequently completely different schema.To Reproduce
Use
ShuffleWriterExec
Expected behavior
ExecutionPlan::schema
should return the same schema as theSendableRecordBatchStream
yielded byExecutionPlan::execute
.Additional context
There is a potentially valid question as to why we have the schema stored in so many places...