Open asfimport opened 7 years ago
Deepak Majeti / @majetideepak: @wesm, @xhochy can you verify this issue on your side ? Thanks!
Wes McKinney / @wesm: What version of parquet-tools and Hive? I'm looking into it
Deepak Majeti / @majetideepak: I tested with parquet-tools-1.9.0 and Hive 1.2
Wes McKinney / @wesm:
I'm able to read files written by parquet-cpp with the cat
command in parquet-tools 1.5.0 and 1.9.0. Any way to reproduce?
Deepak Majeti / @majetideepak: Can you cat this file ?
Deepak Majeti / @majetideepak: I get the following error. It goes away if I do not set the field_id in the SchemaElement.
$ java -jar parquet-tools-1.9.0.jar cat parquet_cpp_example.parquet
Could not read footer: java.lang.RuntimeException: shaded.parquet.org.codehaus.jackson.map.JsonMappingException: No serializer found for class org.apache.parquet.schema.Type$ID and no properties discovered to create BeanSerializer (to avoid exception, disable SerializationConfig.Feature.FAIL_ON_EMPTY_BEANS) ) (through reference chain: org.apache.parquet.hadoop.metadata.ParquetMetadata["fileMetaData"]>org.apache.parquet.hadoop.metadata.FileMetaData["schema"]>org.apache.parquet.schema.MessageType["fields"]>java.util.ArrayList[0]>org.apache.parquet.schema.PrimitiveType["id"])
Deepak Majeti / @majetideepak:
I don't think fieldIds
are implemented in parquet-mr as well. A grep on the codebase does NOT show them being set.
Wes McKinney / @wesm: I don't get an error on my environment, but nothing useful
$ java -jar target/parquet-tools-1.9.0.jar cat parquet_cpp_example.parquet
org/apache/hadoop/fs/Path
I'm OK with nixing the field_id field in parquet-cpp to make this go away. Do you want to do that, or I can quickly write a patch, too?
Deepak Majeti / @majetideepak: Nixing sounds good. If you are at it, please write a patch. Thanks!
Wes McKinney / @wesm: I included this in my patch for PARQUET-842: https://github.com/apache/parquet-cpp/pull/226
Uwe Korn / @xhochy: This problem was recently on the ML and @julienledem suggested:
This looks like a bug in parquet-tools when printing the schema to the console. Possibly adding a @JsonValue annotation to intValue() [1] in Type would fix it. [1] https://github.com/apache/parquet-mr/blob/89e0607cf6470dda1a6a47b46abf37468df4e50f/parquet-column/src/main/java/org/apache/parquet/schema/Type.java#L48
Which rather sounds like this is really a parquet-tools problem and not a parquet-cpp
one. Still the Impala problem with these fields persist.
Wes McKinney / @wesm: I'll create a separate JIRA about debugging the Impala issue
I could not read files written by parquet-cpp from parquet-tools and Hive. Setting field ids in the schema metadata seems to be the problem. We should make setting the field_id optional.
Reporter: Deepak Majeti / @majetideepak
Original Issue Attachments:
Note: This issue was originally created as PARQUET-838. Please see the migration documentation for further details.