apache / orc

Apache ORC - the smallest, fastest columnar storage for Hadoop workloads
https://orc.apache.org/
Apache License 2.0
681 stars 482 forks source link

Bump parquet.version from 1.13.1 to 1.14.0 in /java #1933

Closed dependabot[bot] closed 4 months ago

dependabot[bot] commented 4 months ago

Bumps parquet.version from 1.13.1 to 1.14.0. Updates org.apache.parquet:parquet-hadoop from 1.13.1 to 1.14.0

Changelog

Sourced from org.apache.parquet:parquet-hadoop's changelog.

Version 1.14.0

Release Notes - Parquet - Version 1.14.0

Bug

  • PARQUET-2260 - Bloom filter bytes size shouldn't be larger than maxBytes size in the configuration
  • PARQUET-2266 - Fix support for files without ColumnIndexes
  • PARQUET-2276 - ParquetReader reads do not work with Hadoop version 2.8.5
  • PARQUET-2300 - Update jackson-core 2.13.4 to a version without CVE PRISMA-2023-0067
  • PARQUET-2325 - Fix parquet-cli's dictionary subcommand to work with FIXED_LEN_BYTE_ARRAY
  • PARQUET-2329 - Fix wrong help messages of parquet-cli subcommands
  • PARQUET-2330 - Fix convert-csv to show the correct position of the invalid record
  • PARQUET-2332 - Fix unexpectedly disabled tests to be executed
  • PARQUET-2336 - Add caching key to CodecFactory
  • PARQUET-2342 - Parquet writer produced a corrupted file due to page value count overflow
  • PARQUET-2343 - Fixes NPE when rewriting file with multiple rowgroups
  • PARQUET-2348 - Recompression/Re-encrypt should rewrite bloomfilter
  • PARQUET-2354 - Apparent race condition in CharsetValidator
  • PARQUET-2363 - ParquetRewriter should encrypt the V2 page header
  • PARQUET-2365 - Fixes NPE when rewriting column without column index
  • PARQUET-2408 - Fix license header in .gitattributes
  • PARQUET-2420 - ThriftParquetWriter converts thrift byte to int32 without adding logical type
  • PARQUET-2429 - Direct buffer churn in NonBlockedDecompressor
  • PARQUET-2438 - Fixes minMaxSize for BinaryColumnIndexBuilder
  • PARQUET-2442 - Remove Parquet Site from parquet-mr
  • PARQUET-2448 - parquet-avro does not support nested logical-type for avro <= 1.8
  • PARQUET-2449 - Writing using LocalOutputFile creates a large buffer
  • PARQUET-2450 - ParquetAvroReader throws exception projecting a single field of a repeated record type
  • PARQUET-2456 - avro schema conversion may fail with name conflict when using fixed types
  • PARQUET-2457 - Missing maven-scala-plugin version
  • PARQUET-2458 - Java compiler should use release instead of source/target
  • PARQUET-2465 - Fall back to Hadoop Configuration

New Feature

Improvement

  • PARQUET-1629 - Page-level CRC checksum verification for DataPageV2
  • PARQUET-1822 - Parquet without Hadoop dependencies
  • PARQUET-1942 - Bump Apache Arrow 2.0.0
  • PARQUET-2060 - Parquet corruption can cause infinite loop with Snappy
  • PARQUET-2212 - Add ByteBuffer api for decryptors to allow direct memory to be decrypted
  • PARQUET-2254 - Build a BloomFilter with a more precise size
  • PARQUET-2263 - Upgrade maven-shade-plugin to 3.4.1
  • PARQUET-2265 - AvroParquetWriter should default to data supplier model from Configuration

... (truncated)

Commits
  • fe91794 [maven-release-plugin] prepare release apache-parquet-1.14.0-rc1
  • bb8c72d PARQUET-2465: Fall back to HadoopConfig (#1339) (#1342)
  • 0d43773 [maven-release-plugin] prepare for next development iteration
  • af07402 [maven-release-plugin] prepare release apache-parquet-1.14.0-rc0
  • cde9a63 Update release note for 1.14.0 (#1336)
  • 337d082 PARQUET-2171: (followup) add read metrics and hadoop conf integration for vec...
  • ce02431 Bump org.apache.maven.plugins:maven-shade-plugin from 3.5.2 to 3.5.3 (#1332)
  • cc22b56 Bump net.alchim31.maven:scala-maven-plugin from 4.8.1 to 4.9.0 (#1331)
  • 23c788d PARQUET-2463: Bump japicmp to 0.21.0 (#1329)
  • 09445b5 PARQUET-2451: Add BYTE_STREAM_SPLIT support for FIXED_LEN_BYTE_ARRAY, INT32 a...
  • Additional commits viewable in compare view


Updates org.apache.parquet:parquet-avro from 1.13.1 to 1.14.0

Changelog

Sourced from org.apache.parquet:parquet-avro's changelog.

Version 1.14.0

Release Notes - Parquet - Version 1.14.0

Bug

  • PARQUET-2260 - Bloom filter bytes size shouldn't be larger than maxBytes size in the configuration
  • PARQUET-2266 - Fix support for files without ColumnIndexes
  • PARQUET-2276 - ParquetReader reads do not work with Hadoop version 2.8.5
  • PARQUET-2300 - Update jackson-core 2.13.4 to a version without CVE PRISMA-2023-0067
  • PARQUET-2325 - Fix parquet-cli's dictionary subcommand to work with FIXED_LEN_BYTE_ARRAY
  • PARQUET-2329 - Fix wrong help messages of parquet-cli subcommands
  • PARQUET-2330 - Fix convert-csv to show the correct position of the invalid record
  • PARQUET-2332 - Fix unexpectedly disabled tests to be executed
  • PARQUET-2336 - Add caching key to CodecFactory
  • PARQUET-2342 - Parquet writer produced a corrupted file due to page value count overflow
  • PARQUET-2343 - Fixes NPE when rewriting file with multiple rowgroups
  • PARQUET-2348 - Recompression/Re-encrypt should rewrite bloomfilter
  • PARQUET-2354 - Apparent race condition in CharsetValidator
  • PARQUET-2363 - ParquetRewriter should encrypt the V2 page header
  • PARQUET-2365 - Fixes NPE when rewriting column without column index
  • PARQUET-2408 - Fix license header in .gitattributes
  • PARQUET-2420 - ThriftParquetWriter converts thrift byte to int32 without adding logical type
  • PARQUET-2429 - Direct buffer churn in NonBlockedDecompressor
  • PARQUET-2438 - Fixes minMaxSize for BinaryColumnIndexBuilder
  • PARQUET-2442 - Remove Parquet Site from parquet-mr
  • PARQUET-2448 - parquet-avro does not support nested logical-type for avro <= 1.8
  • PARQUET-2449 - Writing using LocalOutputFile creates a large buffer
  • PARQUET-2450 - ParquetAvroReader throws exception projecting a single field of a repeated record type
  • PARQUET-2456 - avro schema conversion may fail with name conflict when using fixed types
  • PARQUET-2457 - Missing maven-scala-plugin version
  • PARQUET-2458 - Java compiler should use release instead of source/target
  • PARQUET-2465 - Fall back to Hadoop Configuration

New Feature

Improvement

  • PARQUET-1629 - Page-level CRC checksum verification for DataPageV2
  • PARQUET-1822 - Parquet without Hadoop dependencies
  • PARQUET-1942 - Bump Apache Arrow 2.0.0
  • PARQUET-2060 - Parquet corruption can cause infinite loop with Snappy
  • PARQUET-2212 - Add ByteBuffer api for decryptors to allow direct memory to be decrypted
  • PARQUET-2254 - Build a BloomFilter with a more precise size
  • PARQUET-2263 - Upgrade maven-shade-plugin to 3.4.1
  • PARQUET-2265 - AvroParquetWriter should default to data supplier model from Configuration

... (truncated)

Commits
  • fe91794 [maven-release-plugin] prepare release apache-parquet-1.14.0-rc1
  • bb8c72d PARQUET-2465: Fall back to HadoopConfig (#1339) (#1342)
  • 0d43773 [maven-release-plugin] prepare for next development iteration
  • af07402 [maven-release-plugin] prepare release apache-parquet-1.14.0-rc0
  • cde9a63 Update release note for 1.14.0 (#1336)
  • 337d082 PARQUET-2171: (followup) add read metrics and hadoop conf integration for vec...
  • ce02431 Bump org.apache.maven.plugins:maven-shade-plugin from 3.5.2 to 3.5.3 (#1332)
  • cc22b56 Bump net.alchim31.maven:scala-maven-plugin from 4.8.1 to 4.9.0 (#1331)
  • 23c788d PARQUET-2463: Bump japicmp to 0.21.0 (#1329)
  • 09445b5 PARQUET-2451: Add BYTE_STREAM_SPLIT support for FIXED_LEN_BYTE_ARRAY, INT32 a...
  • Additional commits viewable in compare view


Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
cxzl25 commented 4 months ago

waiting [SPARK-48177][BUILD] Upgrade Apache Parquet to 1.14.0

dongjoon-hyun commented 4 months ago

Ya, that's correct. It's blocked by the unit test failures from Spark side.

dongjoon-hyun commented 4 months ago

Let's close this for now. We can handle it later manually.

dependabot[bot] commented 4 months ago

OK, I won't notify you again about this release, but will get in touch when a new version is available. You can also ignore all major, minor, or patch releases for a dependency by adding an ignore condition with the desired update_types to your config file.

If you change your mind, just re-open this PR and I'll resolve any conflicts on it.