apache / parquet-java

Apache Parquet Java
https://parquet.apache.org/
Apache License 2.0
2.48k stars 1.37k forks source link

Add usage documentation for the Java library #2914

Open asfimport opened 3 weeks ago

asfimport commented 3 weeks ago

The Java parquet library has no usage documentation besides the sparse information available in the README. The only thing I could find were a few old (10yr) 3rd party tutorials scattered on the internet using the hadoop module. I spent a work day sifting through the API docs and searching on the internet to try to piece together something. Ultimately, I decided to give up on doing Parquet files using Java because there are alternative file formats that are better documented, and I felt trying to use parquet-mr would be a huge hassle to maintain in the future. This library seems reasonably maintained and comprehensive, but there is just a huge barrier to using the library which I think turns off a lot of developers like me.

I kindly request usage documentation be written to cover all the major aspects of the library, and for the more nitty gritty use cases, pointers to what API classes/methods could be looked at further.

I may be misunderstanding the purpose of this library, and if so, is there a different Java Parquet library that is recommended for higher level parquet file IO?

Reporter: Isaac Nygaard

Note: This issue was originally created as PARQUET-2490. Please see the migration documentation for further details.