apache / parquet-java

Apache Parquet Java
https://parquet.apache.org/
Apache License 2.0
2.65k stars 1.41k forks source link

ProtoSchemaConverter renders invalid schema for oneof in unwrap mode #3039

Open aka-peter opened 3 weeks ago

aka-peter commented 3 weeks ago

Describe the bug, including details regarding any error messages, version, and platform.

When unwrap is enabled all fields of the TestProto3.OneOfTestMessage will be required.

message TestProto3.OneOfTestMessage {
  required int32 first = 1;
  required int32 second = 2;
}

https://github.com/apache/parquet-java/blob/73a4430af6c40f8eb246ad4911eb6d103c9a2abe/parquet-protobuf/src/test/resources/TestProto3.proto#L116

This will never work but tests are missing for unwrap of TestProto3.OneOfTestMessage

The required repetition is added here: https://github.com/apache/parquet-java/blob/73a4430af6c40f8eb246ad4911eb6d103c9a2abe/parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java#L278

This could be caught early by adding withValidation(true) here: https://github.com/apache/parquet-java/blob/73a4430af6c40f8eb246ad4911eb6d103c9a2abe/parquet-protobuf/src/test/java/org/apache/parquet/proto/TestUtils.java#L221

If validation is disabled it will fail when trying to read one of the non-existing fields in the oneof.

Is there really a need for setting all primitive fields in unwrap to required?

Component(s)

No response