apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.4k stars 2.43k forks source link

[SUPPORT] Failed to deserialize HoodieAvroIndexRecord #11734

Open TheR1sing3un opened 2 months ago

TheR1sing3un commented 2 months ago

Tips before filing an issue

Describe the problem you faced

When we try to serialize the HoodieAvroIndexedRecord and then deserialize back, it will cause failure like that:

image

A clear and concise description of the problem.

To Reproduce

Steps to reproduce the behavior:

  1. Add test case in TestSerializationUtils

    image
  2. Run this test

Expected behavior

successful deserialization

Environment Description

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

danny0405 commented 2 months ago

Should we serialize HoodieAvroIndexedRecord directly? It is only a wrapper for nested indexed record currently, we should unwrap it first then serialize.

TheR1sing3un commented 2 months ago

Should we serialize HoodieAvroIndexedRecord directly? It is only a wrapper for nested indexed record currently, we should unwrap it first then serialize.

Thanks for your reply~ Yes, there is no code logic to directly serialize the HoodieAvroIndexedRecord now, but for the sake of subsequent scalability, and its ser/deser related interfaces have been implemented, then I think it is necessary to ensure that this serialization logic can be successful.

danny0405 commented 2 months ago

but for the sake of subsequent scalability

I didn't see, it's just a temporary in-memory wrapper adapter for hoodie record.

TheR1sing3un commented 2 months ago

but for the sake of subsequent scalability

I didn't see, it's just a temporary in-memory wrapper adapter for hoodie record.

If we're really not going to serialize it in the future, we can throw a HoodieNotSupportedException in ser/deser methods~ It doesn't seem reasonable that we leave a ser/deser methods that will always fail.

danny0405 commented 2 months ago

If we're really not going to serialize it in the future, we can throw a HoodieNotSupportedException in ser/deser methods

yeah, can you confirm whether Flink can serialize it correctly? Do we have to always register those Avro SE/DE class or it is just specific for Spark?