apache / parquet-java

Apache Parquet Java
https://parquet.apache.org/
Apache License 2.0
2.49k stars 1.37k forks source link

ParquetMetadata.toPrettyJSON throws exception on file read when LOG.isDebugEnabled() #2904

Closed asfimport closed 1 month ago

asfimport commented 1 month ago

Observed on latest 0.14.x commit, c241170d9bc2cd8415b04e06ecea40ed3d80f64d.

When debug logging is enabled, tests that instantiate a ParquetFileReader fail with:

 

 


java.lang.RuntimeException: com.fasterxml.jackson.databind.exc.InvalidDefinitionException: No serializer found for class org.apache.parquet.schema.LogicalTypeAnnotation$StringLogicalTypeAnnotation and no properties discovered to create BeanSerializer (to avoid exception, disable SerializationFeature.FAIL_ON_EMPTY_BEANS) (through reference chain: org.apache.parquet.hadoop.metadata.ParquetMetadata["fileMetaData"]->org.apache.parquet.hadoop.metadata.FileMetaData["schema"]->org.apache.parquet.schema.MessageType["fields"]->java.util.ArrayList[24]->org.apache.parquet.schema.PrimitiveType["logicalTypeAnnotation"])
at org.apache.parquet.hadoop.metadata.ParquetMetadata.toJSON(ParquetMetadata.java:68)
at org.apache.parquet.hadoop.metadata.ParquetMetadata.toPrettyJSON(ParquetMetadata.java:48)
at org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:1592)
at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:629)
at org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:902)
at org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:698)
at org.apache.parquet.hadoop.ColumnIndexValidator.checkContractViolations(ColumnIndexValidator.java:556)
at org.apache.parquet.statistics.TestColumnIndexes.testColumnIndexes(TestColumnIndexes.java:348)
Caused by: com.fasterxml.jackson.databind.exc.InvalidDefinitionException: No serializer found for class org.apache.parquet.schema.LogicalTypeAnnotation$StringLogicalTypeAnnotation and no properties discovered to create BeanSerializer (to avoid exception, disable SerializationFeature.FAIL_ON_EMPTY_BEANS) (through reference chain: org.apache.parquet.hadoop.metadata.ParquetMetadata["fileMetaData"]->org.apache.parquet.hadoop.metadata.FileMetaData["schema"]->org.apache.parquet.schema.MessageType["fields"]->java.util.ArrayList[24]->org.apache.parquet.schema.PrimitiveType["logicalTypeAnnotation"])
at com.fasterxml.jackson.databind.exc.InvalidDefinitionException.from(InvalidDefinitionException.java:77)
at com.fasterxml.jackson.databind.SerializerProvider.reportBadDefinition(SerializerProvider.java:1330)
at com.fasterxml.jackson.databind.DatabindContext.reportBadDefinition(DatabindContext.java:414)
at com.fasterxml.jackson.databind.ser.impl.UnknownSerializer.failForEmpty(UnknownSerializer.java:53)
at com.fasterxml.jackson.databind.ser.impl.UnknownSerializer.serialize(UnknownSerializer.java:30)
at com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:732)
at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:770)
at com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:183)

(note, this seems to be happening when the schema doesn't contain a logical type, which makes me suspect some Jackson configuration to handle null values is needed?)

 

I also see a few exceptions related to encryption:


2024-05-07 14:37:12 ERROR TestPropertiesDrivenEncryption - ENCRYPT_COLUMNS_AND_FOOTER_CTR - DECRYPT_WITH_KEY_RETRIEVER Error: Didn't expect an exception, but got [com.fasterxml.jackson.databind.exc.InvalidDefinitionException: No serializer found for class org.apache.parquet.crypto.keytools.FileKeyUnwrapper and no properties discovered to create BeanSerializer (to avoid exception, disable SerializationFeature.FAIL_ON_EMPTY_BEANS) (through reference chain: org.apache.parquet.hadoop.metadata.FileMetaData["fileDecryptor"]->org.apache.parquet.crypto.InternalFileDecryptor["decryptionProperties"]->org.apache.parquet.crypto.FileDecryptionProperties["keyRetriever"])]
14185java.lang.RuntimeException: com.fasterxml.jackson.databind.exc.InvalidDefinitionException: No serializer found for class org.apache.parquet.crypto.keytools.FileKeyUnwrapper and no properties discovered to create BeanSerializer (to avoid exception, disable SerializationFeature.FAIL_ON_EMPTY_BEANS) (through reference chain: org.apache.parquet.hadoop.metadata.FileMetaData["fileDecryptor"]- 

To repro, enable debug logging or just comment out if (LOG.isDebugEnabled()) in ParquetMetadataConverter, as I did here: https://github.com/apache/parquet-mr/compare/master...clairemcginty:parquet-mr:repro-avro-metadata-print-bug?expand=1

Reporter: Claire McGinty / @clairemcginty Assignee: Michel Davit / @RustedBones

PRs and other links:

Note: This issue was originally created as PARQUET-2468. Please see the migration documentation for further details.

asfimport commented 1 month ago

Claire McGinty / @clairemcginty: Seeing this on 0.14.0, too. cc @wgtmac, maybe you could take a quick look? I'm concerned since this debug statement is triggered on every call to ParquetFileReader#init.

asfimport commented 1 month ago

Gang Wu / @wgtmac: Is this a new issue from the 1.14.0 release?

asfimport commented 1 month ago

Claire McGinty / @clairemcginty: Seems to be–downgrading to 1.13.1 fixes the issue

asfimport commented 1 month ago

Gang Wu / @wgtmac: Thanks for the confirmation! I will take a look later. This week is too busy for me.

asfimport commented 1 month ago

Michel Davit / @RustedBones: This commit is the cause of the regression: https://github.com/apache/parquet-mr/pull/1144.

asfimport commented 1 month ago

PJ Fanning: We found this issue in Apache Pekko testing too.

 

https://github.com/apache/pekko-connectors/actions/runs/9144385163/job/25142164449?pr=651

 

https://github.com/apache/pekko-connectors/pull/651

asfimport commented 1 month ago

Willi Raschkowski: Seeing toPrettyJson fail in Spark CI as well: https://github.com/apache/spark/pull/46447#issuecomment-2122440281

asfimport commented 1 month ago

Fokko Driesprong / @Fokko: Thanks everyone for chiming in here. I've created a new release for 1.14.1.

asfimport commented 4 weeks ago

Willi Raschkowski: @Fokko, @wgtmac, thank you very much.

Do you plan on expediting the release of 1.14.1 given the reports here with 1.14.0.

asfimport commented 4 weeks ago

Gang Wu / @wgtmac: I have sent an email to dev ML for discussion: https://lists.apache.org/thread/h589zkwnpo592gc9n17v0y7qvd9jv1z4

asfimport commented 4 weeks ago

Willi Raschkowski: Thank you very much!

asfimport commented 2 weeks ago

Willi Raschkowski: @wgtmac, I see you cut 1.14.1 yesterday. Thanks, again.

asfimport commented 2 weeks ago

Gang Wu / @wgtmac: [~rshkv] No problem. Let me know if there is anything else I can help.