When using parquet-tools on a parquet file with null records the null columns are omitted from the output.
Example:
scala> case class Foo(a: Int, b: String)
defined class Foo
scala> org.apache.spark.sql.SparkSession.builder.getOrCreate.createDataset((0 to 1000).map(x => Foo(1,null))).write.parquet("/tmp/foobar/")
When using parquet-tools on a parquet file with null records the null columns are omitted from the output.
Example:
Actual:
Expected:
Reporter: Nicholas Rushton
PRs and other links:
Note: This issue was originally created as PARQUET-1408. Please see the migration documentation for further details.