VIDA-NYU / ache

ACHE is a web crawler for domain-specific search.
http://ache.readthedocs.io
Apache License 2.0
444 stars 135 forks source link

Bump rocksdbjni from 6.25.3 to 7.9.2 #323

Closed dependabot[bot] closed 1 year ago

dependabot[bot] commented 1 year ago

Bumps rocksdbjni from 6.25.3 to 7.9.2.

Release notes

Sourced from rocksdbjni's releases.

RocksDB 7.9.2

7.9.2 (2022-12-21)

Bug Fixes

  • Fixed a heap use after free bug in async scan prefetching when the scan thread and another thread try to read and load the same seek block into cache.

7.9.1 (2022-12-08)

Bug Fixes

  • Fixed a regression in iterator where range tombstones after iterate_upper_bound is processed.
  • Fixed a memory leak in MultiGet with async_io read option, caused by IO errors during table file open

Behavior changes

  • Make best-efforts recovery verify SST unique ID before Version construction (#10962)

7.9.0 (2022-11-21)

Performance Improvements

  • Fixed an iterator performance regression for delete range users when scanning through a consecutive sequence of range tombstones (#10877).

Bug Fixes

  • Fix memory corruption error in scans if async_io is enabled. Memory corruption happened if there is IOError while reading the data leading to empty buffer and other buffer already in progress of async read goes again for reading.
  • Fix failed memtable flush retry bug that could cause wrongly ordered updates, which would surface to writers as Status::Corruption in case of force_consistency_checks=true (default). It affects use cases that enable both parallel flush (max_background_flushes > 1 or max_background_jobs >= 8) and non-default memtable count (max_write_buffer_number > 2).
  • Fixed an issue where the READ_NUM_MERGE_OPERANDS ticker was not updated when the base key-value or tombstone was read from an SST file.
  • Fixed a memory safety bug when using a SecondaryCache with block_cache_compressed. block_cache_compressed no longer attempts to use SecondaryCache features.
  • Fixed a regression in scan for async_io. During seek, valid buffers were getting cleared causing a regression.
  • Tiered Storage: fixed excessive keys written to penultimate level in non-debug builds.

New Features

  • Add basic support for user-defined timestamp to Merge (#10819).
  • Add stats for ReadAsync time spent and async read errors.
  • Basic support for the wide-column data model is now available. Wide-column entities can be stored using the PutEntity API, and retrieved using GetEntity and the new columns API of iterator. For compatibility, the classic APIs Get and MultiGet, as well as iterator's value API return the value of the anonymous default column of wide-column entities; also, GetEntity and iterator's columns return any plain key-values in the form of an entity which only has the anonymous default column. Merge (and GetMergeOperands) currently also apply to the default column; any other columns of entities are unaffected by Merge operations. Note that some features like compaction filters, transactions, user-defined timestamps, and the SST file writer do not yet support wide-column entities; also, there is currently no MultiGet-like API to retrieve multiple entities at once. We plan to gradually close the above gaps and also implement new features like column-level operations (e.g. updating or querying only certain columns of an entity).
  • Marked HyperClockCache as a production-ready alternative to LRUCache for the block cache. HyperClockCache greatly improves hot-path CPU efficiency under high parallel load or high contention, with some documented caveats and limitations. As much as 4.5x higher ops/sec vs. LRUCache has been seen in db_bench under high parallel load.
  • Add periodic diagnostics to info_log (LOG file) for HyperClockCache block cache if performance is degraded by bad estimated_entry_charge option.

Public API Changes

  • Marked block_cache_compressed as a deprecated feature. Use SecondaryCache instead.
  • Added a SecondaryCache::InsertSaved() API, with default implementation depending on Insert(). Some implementations might need to add a custom implementation of InsertSaved(). (Details in API comments.)

RocksDB 7.8.3

7.8.3 (2022-11-29)

  • Revert an internal change in 7.8.0 associated with some memory usage churn.

7.8.2 (2022-11-27)

Behavior changes

  • Make best-efforts recovery verify SST unique ID before Version construction (#10962)
  • Fix failed memtable flush retry bug that could cause wrongly ordered updates, which would surface to writers as Status::Corruption in case of force_consistency_checks=true (default). It affects use cases that enable both parallel flush (max_background_flushes > 1 or max_background_jobs >= 8) and non-default memtable count (max_write_buffer_number > 2).
  • Tiered Storage: fixed excessive keys written to penultimate level in non-debug builds.

Bug Fixes

  • Fixed a regression in scan for async_io. During seek, valid buffers were getting cleared causing a regression.
  • Fixed a performance regression in iterator where range tombstones after iterate_upper_bound is processed.

... (truncated)

Changelog

Sourced from rocksdbjni's changelog.

7.9.2 (12/21/2022)

Bug Fixes

  • Fixed a heap use after free bug in async scan prefetching when the scan thread and another thread try to read and load the same seek block into cache.

7.9.1 (12/8/2022)

Bug Fixes

  • Fixed a regression in iterator where range tombstones after iterate_upper_bound is processed.
  • Fixed a memory leak in MultiGet with async_io read option, caused by IO errors during table file open

Behavior changes

  • Make best-efforts recovery verify SST unique ID before Version construction (#10962)

7.9.0 (11/21/2022)

Performance Improvements

  • Fixed an iterator performance regression for delete range users when scanning through a consecutive sequence of range tombstones (#10877).

Bug Fixes

  • Fix memory corruption error in scans if async_io is enabled. Memory corruption happened if there is IOError while reading the data leading to empty buffer and other buffer already in progress of async read goes again for reading.
  • Fix failed memtable flush retry bug that could cause wrongly ordered updates, which would surface to writers as Status::Corruption in case of force_consistency_checks=true (default). It affects use cases that enable both parallel flush (max_background_flushes > 1 or max_background_jobs >= 8) and non-default memtable count (max_write_buffer_number > 2).
  • Fixed an issue where the READ_NUM_MERGE_OPERANDS ticker was not updated when the base key-value or tombstone was read from an SST file.
  • Fixed a memory safety bug when using a SecondaryCache with block_cache_compressed. block_cache_compressed no longer attempts to use SecondaryCache features.
  • Fixed a regression in scan for async_io. During seek, valid buffers were getting cleared causing a regression.
  • Tiered Storage: fixed excessive keys written to penultimate level in non-debug builds.

New Features

  • Add basic support for user-defined timestamp to Merge (#10819).
  • Add stats for ReadAsync time spent and async read errors.
  • Basic support for the wide-column data model is now available. Wide-column entities can be stored using the PutEntity API, and retrieved using GetEntity and the new columns API of iterator. For compatibility, the classic APIs Get and MultiGet, as well as iterator's value API return the value of the anonymous default column of wide-column entities; also, GetEntity and iterator's columns return any plain key-values in the form of an entity which only has the anonymous default column. Merge (and GetMergeOperands) currently also apply to the default column; any other columns of entities are unaffected by Merge operations. Note that some features like compaction filters, transactions, user-defined timestamps, and the SST file writer do not yet support wide-column entities; also, there is currently no MultiGet-like API to retrieve multiple entities at once. We plan to gradually close the above gaps and also implement new features like column-level operations (e.g. updating or querying only certain columns of an entity).
  • Marked HyperClockCache as a production-ready alternative to LRUCache for the block cache. HyperClockCache greatly improves hot-path CPU efficiency under high parallel load or high contention, with some documented caveats and limitations. As much as 4.5x higher ops/sec vs. LRUCache has been seen in db_bench under high parallel load.
  • Add periodic diagnostics to info_log (LOG file) for HyperClockCache block cache if performance is degraded by bad estimated_entry_charge option.

Public API Changes

  • Marked block_cache_compressed as a deprecated feature. Use SecondaryCache instead.
  • Added a SecondaryCache::InsertSaved() API, with default implementation depending on Insert(). Some implementations might need to add a custom implementation of InsertSaved(). (Details in API comments.)

7.8.0 (10/22/2022)

New Features

  • DeleteRange() now supports user-defined timestamp.
  • Provide support for async_io with tailing iterators when ReadOptions.tailing is enabled during scans.
  • Tiered Storage: allow data moving up from the last level to the penultimate level if the input level is penultimate level or above.
  • Added DB::Properties::kFastBlockCacheEntryStats, which is similar to DB::Properties::kBlockCacheEntryStats, except returns cached (stale) values in more cases to reduce overhead.
  • FIFO compaction now supports migrating from a multi-level DB via DB::Open(). During the migration phase, FIFO compaction picker will:
  • picks the sst file with the smallest starting key in the bottom-most non-empty level.
  • Note that during the migration phase, the file purge order will only be an approximation of "FIFO" as files in lower-level might sometime contain newer keys than files in upper-level.
  • Added an option ignore_max_compaction_bytes_for_input to ignore max_compaction_bytes limit when adding files to be compacted from input level. This should help reduce write amplification. The option is enabled by default.
  • Tiered Storage: allow data moving up from the last level even if it's a last level only compaction, as long as the penultimate level is empty.
  • Add a new option IOOptions.do_not_recurse that can be used by underlying file systems to skip recursing through sub directories and list only files in GetChildren API.
  • Add option preserve_internal_time_seconds to preserve the time information for the latest data. Which can be used to determine the age of data when preclude_last_level_data_seconds is enabled. The time information is attached with SST in table property rocksdb.seqno.time.map which can be parsed by tool ldb or sst_dump.

Bug Fixes

... (truncated)

Commits
  • 444b3f4 Add a unit test for PR 11049
  • 23bec22 Bump version to 7.9.2
  • 208310a Fix async prefetch heap use after free (#11049)
  • afa9203 Update version.h to 7.9.1
  • a5e9481 Update HISTORY.md for 7.9.1
  • 742035d Fix table cache leak in MultiGet with async_io (#10997)
  • 62f3e58 Revert "Improve / refactor anonymous mmap capabilities (#10810)"
  • be728f8 Revert "Fix include of windows.h in mmap.h (#10885)"
  • 3ff86f3 Revert PR 10777 "Fix FIFO causing overlapping seqnos in L0 files due to overl...
  • ab9741a Merge pull request #10984 from cbi42/7.9-iterate-upperbound
  • Additional commits viewable in compare view


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
dependabot[bot] commented 1 year ago

Superseded by #336.