apache / parquet-java

Apache Parquet Java
https://parquet.apache.org/
Apache License 2.0
2.65k stars 1.41k forks source link

parquet-tools SimpleRecord does not display empty fields #2224

Open asfimport opened 6 years ago

asfimport commented 6 years ago

When using parquet-tools on a parquet file with null records the null columns are omitted from the output.

 

Example:


scala> case class Foo(a: Int, b: String)
defined class Foo

scala> org.apache.spark.sql.SparkSession.builder.getOrCreate.createDataset((0 to 1000).map(x => Foo(1,null))).write.parquet("/tmp/foobar/")

Actual:


☁  parquet-tools [master] ⚡  java -jar target/parquet-tools-1.10.1-SNAPSHOT.jar cat -j /tmp/foobar/part-00000-436a4d37-d82a-4771-8e7e-e4d428464675-c000.snappy.parquet | head -n5
{"a":1}
{"a":1}
{"a":1}
{"a":1}
{"a":1}

Expected:


☁  parquet-tools [master] ⚡  java -jar target/parquet-tools-1.10.1-SNAPSHOT.jar cat -j /tmp/foobar/part-00000-436a4d37-d82a-4771-8e7e-e4d428464675-c000.snappy.parquet | head -n5
{"a":1,"b":null}
{"a":1,"b":null}
{"a":1,"b":null}
{"a":1,"b":null}
{"a":1,"b":null}

 

Reporter: Nicholas Rushton

PRs and other links:

Note: This issue was originally created as PARQUET-1408. Please see the migration documentation for further details.

asfimport commented 6 years ago

Nicholas Rushton: https://github.com/apache/parquet-mr/pull/518

asfimport commented 5 years ago

Gabor Szadovszky / @gszadovszky: As this issue is not a regression since 1.10.0 and is minor I am removing the target 1.11.0.