Open asfimport opened 4 years ago
Gabor Szadovszky / @gszadovszky: I'm working on a general concept of allowing configuration to be set for specific columns. See PARQUET-1784 for details. What do you think of having the mentioned configuration as follows?
conf.set("parquet.bloom.filter.enabled", false); // Might not be required as this is the default
conf.set("parquet.bloom.filter.enabled#content", true); // Might not be necessary as by setting the expected ndv you explicitly sets this one
conf.set("parquet.bloom.filter.enabled#line", true); // Might not be necessary as by setting the expected ndv you explicitly sets this one
conf.set("parquet.bloom.filter.expected.ndv#content", 1000);
conf.set("parquet.bloom.filter.expected.ndv#line", 200);
This might require more writing but more clear and less error prone.
Walid Gara / @garawalid: I left you a comment inside PARQUET-1784. I think it's better to keep the discussion there.
In the bloom filter feature, when I pass the expected distinct numbers as below, I got null values instead of 1000 and 200.
The issue is coming from getting the system property of expected distinct numbers through [Long.getLong(expectedNDVs[i])|https://github.com/apache/parquet-mr/blob/a737141a571e3cb6cee2c252dc4406e26e6c1177/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetOutputFormat.java#L251].
It's possible to fix it by parsing the string with Long.parseLong(expectedNDVs[i]).
Reporter: Walid Gara / @garawalid Assignee: Walid Gara / @garawalid
PRs and other links:
Note: This issue was originally created as PARQUET-1787. Please see the migration documentation for further details.