apache / parquet-java

Apache Parquet Java
https://parquet.apache.org/
Apache License 2.0
2.49k stars 1.37k forks source link

Out of the box support for LocalOutputFile with ParquetWriter? #2938

Open victornoel opened 4 days ago

victornoel commented 4 days ago

Hi,

It seems that even though it is possible to use ParquetReader with LocalInputFile out of the box, it is not possible to use ParquetWriter with LocalOutputFile.

This forces people to implement their own Builder (as it is abstract) and thus requires to add hadoop dependencies in the classpath.

It would be great if it was possible to do so, cheers!

victornoel commented 4 days ago

Note that found https://github.com/apache/parquet-java/issues/2473 and https://github.com/apache/parquet-java/pull/1111: they both seem to contain comments about the fact that it is not working as expected, maybe even for reading…

victornoel commented 4 days ago

I also found https://github.com/apache/parquet-java/issues/1497 that seems to be similar to my issue.

wgtmac commented 2 days ago

IIUC, there are still some gaps to totally remove Hadoop dependency. At least I have to depend on hadoop-client-api to make build happy.

cc @amousavigourabi @Fokko for advice.

victornoel commented 2 days ago

@wgtmac unfortunately I can't even "just" use hadoop-client-api because it's not functional by itself, it relies on shaded classes that are not actually included in the dependencies/classpath. So it fails at runtime.

Also my issue is not a question, there is some feature missing if I want to use LocalOutputFile with ParquetWriter.

wgtmac commented 1 day ago

Thanks for the clarification! I agree with you that I have ran into same issue. It seems that removing Hadoop dependency is only partially implemented. I need more time to investigate this topic. If you have any idea to resolve this, please feel free to open a PR.