Update FromArrowRecordBatch just in case we have a RecordBatch with a StructArray in it. We'll flatten out the StructArray into a regular DataFrame. Once this goes in, I'll open another PR to update the version number for MDA.
Update the Arrow dependency to the latest version. This will prevent accidental "API not found" errors at runtime in the dotnet-spark repo.
OLD
The following methods on DataFrameColumn are being made public:
GetArrowField
GetMaxRecordBatchLength
ToArrowArray
These 3 methods are the ones we need to support Spark 3.0.
There is an argument to be made here that these APIs should remain protected. The alternative we have here is to update just the existing DataFrame.ToArrowRecordBatches() method to return a Spark 3.0 compatible RecordBatch. Because dotnet-spark's dependencies on MDA are specified as exact versions, this should work and no backend changes would be needed on the dotnet-spark side! I'm inclined to update DataFrame.ToArrowRecordBatches() personally, but I don't mind making these 3 methods public either.
Now that we've determined that Spark is unlikely to need this new API, we can keep the Struct_childColumnName I think. Other than that, this PR should be good to go in
2 things going on in this PR:
FromArrowRecordBatch
just in case we have aRecordBatch
with aStructArray
in it. We'll flatten out theStructArray
into a regularDataFrame
. Once this goes in, I'll open another PR to update the version number for MDA.OLD The following methods on
DataFrameColumn
are being made public:These 3 methods are the ones we need to support Spark 3.0.
There is an argument to be made here that these APIs should remain protected. The alternative we have here is to update just the existing
DataFrame.ToArrowRecordBatches()
method to return a Spark 3.0 compatibleRecordBatch
. Because dotnet-spark's dependencies on MDA are specified as exact versions, this should work and no backend changes would be needed on the dotnet-spark side! I'm inclined to updateDataFrame.ToArrowRecordBatches()
personally, but I don't mind making these 3 methods public either.