Open asfimport opened 7 years ago
Ryan Blue / @rdblue: What's the use case for this? I don't think we support it in the Java version either. Curious about whether that's something we should require in the format.
Wes McKinney / @wesm: I see. I wasn't sure if some Parquet implementations were possibly writing data to this field and we weren't allowing a way to access it (the Thrift structs are not publicly exposed in parquet-cpp)
Rahul Kumar Challapalli: Thanks for reporting this jira [~wesm_impala_7e40]. @rdblue My use case is simple enough. I want to store the min and max for a single column, which is sorted, at the row-group level and probably at the page level as well. Am I missing an obvious way to do this?
Wes McKinney / @wesm: Ah, you want to use the built-in statistics for that rather than key-value metadata
Rahul Kumar Challapalli: @wesm Thank you, I knew something simple like this should have been there. Now how are these statistics populated? I would like to either programatically set them (for the min/max case) or provide a comparator. Also I am using arrow abstraction over parquet readers and writers. It would be helpful if you can point me to code/tests which write & read statistics.
This is available already at the file level:
https://github.com/apache/parquet-cpp/blob/master/src/parquet/file/metadata.h#L177
but not at the ColumnChunk level
Reporter: Wes McKinney / @wesm
Note: This issue was originally created as PARQUET-1107. Please see the migration documentation for further details.