apache / parquet-java

Apache Parquet Java
https://parquet.apache.org/
Apache License 2.0
2.58k stars 1.4k forks source link

PARQUET-2171: (followup) add read metrics and hadoop conf integration for vector io reader #1330

Closed parthchandra closed 5 months ago

parthchandra commented 5 months ago

This is a followup with minor fixes/additions for the vector io based file reader

Jira

Tests

Documentation

Existing documentation is sufficient

parthchandra commented 5 months ago

@wgtmac, @steveloughran Some minor additions to the vector io based file reader. Adds the read metrics added in the serial reader path. Also adds the default construction in read options to read the hadoop conf for the vector io setting. Please take a look.

parthchandra commented 5 months ago

Thank you @wgtmac !

steveloughran commented 5 months ago

looks great. If there's another 14.0 RC, will this go in to it?

Note we create lots and lots of IOstatistics, for vector reads we include #of bytes read and discarded along with all the other timings. My WiP to make that accessible via reflection will help, but it'd still need work in parquet to aggregate. https://github.com/apache/hadoop/pull/6686 you can have all the stats as a piece of JSON if that helps, then parquet lib just has its own copy of the stats class to parse it...

wgtmac commented 4 months ago

I think this is already included in the 1.14.0 RC0/RC1