JuliaIO / Parquet.jl

Julia implementation of Parquet columnar file format reader
Other
119 stars 32 forks source link

Corrupt Parquet Schema when loading parquet file into Databricks #174

Open characat0 opened 11 months ago

characat0 commented 11 months ago

While loading a file produced using Parquet.jl into Databricks I ran into the following error:

Corrupt Parquet Schema: Only one of num_children and type should be set in SchemaElement

According to https://github.com/apache/parquet-format/blob/4701809cb65373b4404b46b6f01110d020f4d1c8/src/main/thrift/parquet.thrift#L437

  /** Nested fields.  Since thrift does not support nested fields,
   * the nesting is flattened to a single list by a depth-first traversal.
   * The children count is used to construct the nested relationship.
   * This field is not set when the element is a primitive type
   */
  5: optional i32 num_children;

the field num_children should not be set if the element is a leaf node.