apache / parquet-java

Apache Parquet Java
https://parquet.apache.org/
Apache License 2.0
2.49k stars 1.37k forks source link

PARQUET-2465: Fall back to HadoopConfig #1339

Closed Fokko closed 2 months ago

Fokko commented 2 months ago

We see that this causes the 1.14 to be incompatible with the previous releases.

Jira

Tests

Commits

Style

Documentation

Fokko commented 2 months ago

Good to go. @wgtmac @shangxinli @amousavigourabi @vinooganesh LMKWYT

vinooganesh commented 2 months ago

👍 this looks good to me, but do we want to actually mark the hadoop methods as deprecated if we are going to assume that parquet-mr 1.x will always rely on hadoop? Or is there actually a plan to drop the hadoop dependency on future 1.x releases?

Fokko commented 2 months ago

👍 this looks good to me, but do we want to actually mark the hadoop methods as deprecated if we are going to assume that parquet-mr 1.x will always rely on hadoop? Or is there actually a plan to drop the hadoop dependency on future 1.x releases?

You can still use the Hadoop config, but you'll need to wrap it into a HadoopParquetConfiguration: https://github.com/apache/parquet-mr/blob/68609198c4fecaa0e8fb1bcaa2c8a353030de962/parquet-hadoop/src/main/java/org/apache/parquet/conf/HadoopParquetConfiguration.java#L42-L44

amousavigourabi commented 2 months ago

👍 this looks good to me, but do we want to actually mark the hadoop methods as deprecated if we are going to assume that parquet-mr 1.x will always rely on hadoop? Or is there actually a plan to drop the hadoop dependency on future 1.x releases?

I'd like to note that we have other stuff that is deprecated because they will be dropped in 2.0 without any plans to remove them in 1.x releases as well (see: org.apache.parquet.avro.AvroParquetReader#builder(Path) for an example of this), so this is consistent with the usage in the rest of the project.

vinooganesh commented 2 months ago

Sounds great, thank @Fokko and @amousavigourabi!