Open sephib opened 2 months ago
Hello! This might be a pyhudi error -- cc @xushiyan from the Hudi team for any thoughts
We are currently awaiting the Hudi team's implementation of Hudi-rs which would give us more robust support for Hudi
Just adding additional context
It seams to be an Avro
vs. Arrow
issue.
When trying to use hudi-rs we also get an error:
ArrowInvalid: Schema at index X was different
this is an example of what we are running
from hudi import HudiTable # pip install hudi
import pyarrow as pa
hudi_path=f's3://path/to/hudi/table'
hudi_table = HudiTable(hudi_path)
records = hudi_table.read_snapshot()
arrow_table = pa.Table.from_batches(records)
The schema of the records are not the same.
Ok yeah this might be a Hudi issue in general then. Do you mind filing an issue against hudi-rs
and linking that issue here please @sephib ?
Describe the bug
General Description While reading data from a hudi table with daft.read_hudi() we are getting an error that is caused due to a miss-match of the columns.
Setup
Avro
schema (.avsc file) and adding two new string columns in between our existing columnsschema
. For new data, these columns started populating correctly, while for existing rows, these new columns are null.When running
In both cases the type it is trying to use is the one prior to the alteration of the table
When reading a
hudi
table that has not been altered there is no problem.We are using
Any suggestions?