Currently the parquet check-stats command only supports checking BINARY type columns. I hope it can be a general data corruption detection command. So I add some options for it.
[ ] My PR adds the following unit tests OR does not need testing for this extremely good reason:
Commits
[ ] My commits all reference GitHub issues in their subject lines. In addition, my commits follow the guidelines
from "How to write a good git commit message":
Subject is separated from body by a blank line
Subject is limited to 50 characters (not including GitHub issue reference)
Subject does not end with a period
Subject uses the imperative mood ("add", not "adding")
Body wraps at 72 characters
Body explains "what" and "why", not "how"
Style
[ ] My contribution adheres to the code style guidelines and Spotless passes.
To apply the necessary changes, run mvn spotless:apply -Pvector-plugins
Documentation
[ ] In case of new functionality, my PR adds documentation that describes how to use it.
All the public functions and the classes in the PR contain Javadoc that explain what it does
Currently the parquet check-stats command only supports checking BINARY type columns. I hope it can be a general data corruption detection command. So I add some options for it.
Issue: https://github.com/apache/parquet-java/issues/1382
Make sure you have checked all steps below.
Issue
Tests
Commits
Style
mvn spotless:apply -Pvector-plugins
Documentation