OpenAtomFoundation / pika

Pika is a Redis-Compatible database developed by Qihoo's infrastructure team.
BSD 3-Clause "New" or "Revised" License
5.85k stars 1.19k forks source link

refactor: geo related tcl tests #2753

Closed saz97 closed 3 months ago

saz97 commented 3 months ago

Fixed several geo bugs:

  1. Modified error messages to be consistent with Redis. For example, for the same error, Pika returns ERR Invalid argument, while Redis returns WRONGTYPE Operation against a key holding the wrong kind of value.

  2. When using the GEORADIUS command, Pika's default sort value is Unsort, whereas Redis's default sort value is Asc.

  3. When the store and storedist options are enabled in the GEO command, Pika does not ensure data consistency between the storage layer and the cache layer.

  4. When the store and storedist options are enabled in the GEO command, Pika appends the new results to the target key, while Redis replaces the existing data in the target key with the new data.

  5. There is a logical error in Pika when finding the search boundaries (using geohashBoundingBox).

  6. There is a logical error in Pika when calculating the distance between two points (using geohashGetDistance).

  7. There is a logical error in Pika when validating the step's validity.

Result display: Pika can pass all geo TCL test cases except for the unsupported geo commands. image

修改了若干关于geo的bug

  1. 修改了与 Redis 不一致的报错内容。例如,对于相同的错误,Pika 报错为 ERR Invalid argument,而 Redis 报错为 WRONGTYPE Operation against a key holding the wrong kind of value

  2. 使用 GEORADIUS 命令时,Pika 的默认排序值为 Unsort,而 Redis 的默认排序值为 Asc

  3. 在 GEO 命令的 storestoredist 选项启动时,Pika 没有保证存储层和缓存层的数据一致性。

  4. 在 GEO 命令的 storestoredist 选项启动时,Pika 将新的结果追加到目标键中,而 Redis 则是用新数据替换目标键中的现有数据。

  5. 在查找搜索边界时(geohashBoundingBox),Pika 的逻辑有误。

  6. 在判断两点距离时(geohashGetDistance),Pika 的逻辑有误。

  7. 在检验 step 的有效性时,Pika 的逻辑有误。

Summary by CodeRabbit

coderabbitai[bot] commented 3 months ago

Walkthrough

The changes span several source files and configuration updates, focusing on enhancing error handling, refining geohash calculations, and adjusting configuration settings for performance and stability. Key modifications involve improved error checks on rocksdb::Status, optimized distance and geohash logic, and updated sync settings in the configuration file. Collectively, these changes aim to improve robustness and efficiency.

Changes

File(s) Change Summary
src/pika_geo.cc Updated handling of rocksdb::Status. Adjusted error-handling in GeoAddCmd::Do(), GeoPosCmd::Do(), etc.
src/pika_geohash_helper.cc Refactored geohash and distance calculations, simplifying and improving logic for step adjustments.
src/storage/src/redis_zsets.cc Corrected indentation, added error handling for ZRange and ZScore methods.
tests/assets/default.conf Modified sync thread settings, added parameters for control over WAL size, rsync settings, and background jobs.
tests/assets/default.conf (new changesets) Adjusted comment example in rename-command, minimal change without logic impact.

🌾📝 From deep in code, where bytes align, RocksDB whispers, functions refine. Geohash paths now more clear, With gentler errors, never fear. A tweak in config, sync anew, Performance soars, behold the view! 🌿🐇


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share - [X](https://twitter.com/intent/tweet?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A&url=https%3A//coderabbit.ai) - [Mastodon](https://mastodon.social/share?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A%20https%3A%2F%2Fcoderabbit.ai) - [Reddit](https://www.reddit.com/submit?title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&text=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code.%20Check%20it%20out%3A%20https%3A//coderabbit.ai) - [LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fcoderabbit.ai&mini=true&title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&summary=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code)
Tips ### Chat There are 3 ways to chat with [CodeRabbit](https://coderabbit.ai): - Review comments: Directly reply to a review comment made by CodeRabbit. Example: - `I pushed a fix in commit .` - `Generate unit testing code for this file.` - `Open a follow-up GitHub issue for this discussion.` - Files and specific lines of code (under the "Files changed" tab): Tag `@coderabbitai` in a new review comment at the desired location with your query. Examples: - `@coderabbitai generate unit testing code for this file.` - `@coderabbitai modularize this function.` - PR comments: Tag `@coderabbitai` in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples: - `@coderabbitai generate interesting stats about this repository and render them as a table.` - `@coderabbitai show all the console.log statements in this repository.` - `@coderabbitai read src/utils.ts and generate unit testing code.` - `@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.` - `@coderabbitai help me debug CodeRabbit configuration file.` Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. ### CodeRabbit Commands (invoked as PR comments) - `@coderabbitai pause` to pause the reviews on a PR. - `@coderabbitai resume` to resume the paused reviews. - `@coderabbitai review` to trigger an incremental review. This is useful when automatic reviews are disabled for the repository. - `@coderabbitai full review` to do a full review from scratch and review all the files again. - `@coderabbitai summary` to regenerate the summary of the PR. - `@coderabbitai resolve` resolve all the CodeRabbit review comments. - `@coderabbitai configuration` to show the current CodeRabbit configuration for the repository. - `@coderabbitai help` to get help. Additionally, you can add `@coderabbitai ignore` anywhere in the PR description to prevent this PR from being reviewed. ### CodeRabbit Configration File (`.coderabbit.yaml`) - You can programmatically configure CodeRabbit by adding a `.coderabbit.yaml` file to the root of your repository. - Please see the [configuration documentation](https://docs.coderabbit.ai/guides/configure-coderabbit) for more information. - If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: `# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json` ### Documentation and Community - Visit our [Documentation](https://coderabbit.ai/docs) for detailed information on how to use CodeRabbit. - Join our [Discord Community](https://discord.com/invite/GsXnASn26c) to get help, request features, and share feedback. - Follow us on [X/Twitter](https://twitter.com/coderabbitai) for updates and announcements.
saz97 commented 3 months ago

@CodiumAI-Agent review

CodiumAI-Agent commented 3 months ago

PR Reviewer Guide 🔍

⏱️ Estimated effort to review [1-5] 4
🧪 Relevant tests Yes
🔒 Security concerns No
⚡ Key issues to review Possible Bug:
The PR introduces a change in the error handling logic where the condition checks for s.IsInvalidArgument() instead of s_.IsInvalidArgument(). This change is consistent across multiple files and functions. Ensure that the variable s is correctly capturing the status and that this change is intended and tested.
Data Consistency:
The changes in GeoRadiusCmd::Do() and related functions to handle the store and storedist options by deleting existing keys and adding new ones need careful review to ensure data consistency is maintained as expected.
Logical Error Fixes:
The PR mentions fixes for logical errors in functions like geohashBoundingBox and geohashGetDistance. Review these changes to ensure the new logic correctly addresses the issues without introducing new bugs.
Performance Concerns:
The modifications in sorting logic within GeoRadiusCmd::Do() could have performance implications depending on how the sorting conditions are structured and the data size. It's crucial to benchmark these changes.
saz97 commented 3 months ago

@CodiumAI-Agent improve

CodiumAI-Agent commented 3 months ago

PR Code Suggestions ✨

CategorySuggestion                                                                                                                                    Score
Possible issue
Add error handling for invalid geographical coordinates ___ **Consider adding error handling for the geo_distance function to manage cases where the
input values are not valid geographical coordinates.** [tests/unit/type/geo.tcl [51]](https://github.com/OpenAtomFoundation/pika/pull/2753/files#diff-f19f97831168e4ce4a303dafe51e672a64f8a62dbf8a40dcd164565ae591f95aR51-R51) ```diff +if {![string is double -strict $lon] || ![string is double -strict $lat] || ![string is double -strict $search_lon] || ![string is double -strict $search_lat]} { + error "Invalid input: coordinates must be numeric" +} set distance [geo_distance $lon $lat $search_lon $search_lat] ```
Suggestion importance[1-10]: 9 Why: Adding error handling for invalid geographical coordinates is crucial for robustness and preventing potential runtime errors.
9
Enforce the valid range for sync-binlog-thread-num to prevent misconfigurations ___ **To prevent potential misconfigurations, consider adding a range check for
sync-binlog-thread-num to ensure it falls within the valid range as described. This could
be implemented in the startup scripts or within the application logic that reads this
configuration.** [tests/assets/default.conf [46]](https://github.com/OpenAtomFoundation/pika/pull/2753/files#diff-368052a10bd7bbc06037a29195afadda767a5777746d659145153ae429998697R46-R46) ```diff -# Valid range of sync-binlog-thread-num is [1, databases], the final value of it is Min(sync-binlog-thread-num, databases) -sync-binlog-thread-num : 1 +# Ensure sync-binlog-thread-num is within the valid range [1, databases] +sync-binlog-thread-num : max(1, min(sync-binlog-thread-num, databases)) ```
Suggestion importance[1-10]: 9 Why: This suggestion provides a robust solution to ensure `sync-binlog-thread-num` falls within the valid range, which is crucial for preventing misconfigurations and ensuring the system operates correctly.
9
Add validation to ensure sync-binlog-thread-num does not exceed databases ___ **Consider adding a validation or a warning mechanism to ensure that the value of
sync-binlog-thread-num does not exceed the value of databases. This is to enforce the
constraint mentioned in the comment and prevent misconfiguration.** [tests/assets/default.conf [46]](https://github.com/OpenAtomFoundation/pika/pull/2753/files#diff-368052a10bd7bbc06037a29195afadda767a5777746d659145153ae429998697R46-R46) ```diff -sync-binlog-thread-num : 1 +# Ensure sync-binlog-thread-num does not exceed the number of databases +sync-binlog-thread-num : min(1, databases) ```
Suggestion importance[1-10]: 8 Why: This suggestion addresses a potential misconfiguration issue by ensuring that `sync-binlog-thread-num` does not exceed the number of databases, which is a significant improvement for maintaining correct configurations.
8
Possible bug
Ensure status check before processing in GeoDistCmd::Do() ___ **In the GeoDistCmd::Do() method, consider adding a check for s.ok() before proceeding with
decoding the geohash. This ensures that the status is verified before attempting
operations that depend on the successful retrieval of the score.** [src/pika_geo.cc [168-170]](https://github.com/OpenAtomFoundation/pika/pull/2753/files#diff-7cd4c5c2968ce34090b73e38fec6d9265edb8173775149ee92c7b97ccc244261R168-R170) ```diff -if (s.ok()) { - GeoHashBits hash = {.bits = static_cast(first_score), .step = GEO_STEP_MAX}; - geohashDecodeToLongLatWGS84(hash, first_xy); +if (!s.ok()) { + res_.SetRes(CmdRes::kErrOther, s.ToString()); + return; } +GeoHashBits hash = {.bits = static_cast(first_score), .step = GEO_STEP_MAX}; +geohashDecodeToLongLatWGS84(hash, first_xy); ```
Suggestion importance[1-10]: 8 Why: This suggestion correctly identifies a potential issue where the status should be checked before proceeding with operations that depend on it. This improves the robustness of the method.
8
Maintainability
Replace magic number with a constant for better readability ___ **Replace the magic number 6372797.560856 with a well-named constant to improve code
readability and maintainability.** [tests/unit/type/geo.tcl [13-14]](https://github.com/OpenAtomFoundation/pika/pull/2753/files#diff-f19f97831168e4ce4a303dafe51e672a64f8a62dbf8a40dcd164565ae591f95aR13-R14) ```diff -expr {2.0 * 6372797.560856 * \ +set earth_radius_m 6372797.560856 +expr {2.0 * $earth_radius_m * \ asin(sqrt($u * $u + cos($lat1r) * cos($lat2r) * $v * $v))} ```
Suggestion importance[1-10]: 8 Why: Replacing the magic number with a constant improves code readability and maintainability, making it clear what the number represents.
8
Enhancement
Improve error handling in the GetAllNeighbors function to cover all non-ok statuses ___ **For the GetAllNeighbors function, consider handling the case where
db->storage()->ZRangebyscore returns a status other than NotFound or ok. This would ensure
that all error statuses are appropriately handled, preventing potential unhandled
exceptions or errors.** [src/pika_geo.cc [317]](https://github.com/OpenAtomFoundation/pika/pull/2753/files#diff-7cd4c5c2968ce34090b73e38fec6d9265edb8173775149ee92c7b97ccc244261R317-R317) ```diff -if (!s.ok() && !s.IsNotFound()) { - res.SetRes(CmdRes::kErrOther, s.ToString()); - return; +if (!s.ok()) { + if (s.IsNotFound()) { + // Handle not found case, possibly continue or return specific error + continue; + } else { + res.SetRes(CmdRes::kErrOther, s.ToString()); + return; + } } ```
Suggestion importance[1-10]: 7 Why: The suggestion is valid and enhances error handling by ensuring all non-ok statuses are appropriately managed. However, the existing code already handles `s.IsInvalidArgument()`, so the improvement is minor.
7
Use pow function for squaring values ___ **Use the built-in pow function for squaring to enhance readability and potentially improve
performance.** [tests/unit/type/geo.tcl [11-14]](https://github.com/OpenAtomFoundation/pika/pull/2753/files#diff-f19f97831168e4ce4a303dafe51e672a64f8a62dbf8a40dcd164565ae591f95aR11-R14) ```diff -expr {sin(($lon2r - $lon1r) / 2)} -expr {sin(($lat2r - $lat1r) / 2)} +set v [expr {sin(($lon2r - $lon1r) / 2)}] +set u [expr {sin(($lat2r - $lat1r) / 2)}] expr {2.0 * 6372797.560856 * \ - asin(sqrt($u * $u + cos($lat1r) * cos($lat2r) * $v * $v))} + asin(sqrt(pow($u, 2) + cos($lat1r) * cos($lat2r) * pow($v, 2)))} ```
Suggestion importance[1-10]: 7 Why: Using the `pow` function for squaring values enhances readability and may improve performance, though the improvement is minor.
7
Clarify the dynamic change support comments for rsync settings ___ **Update the comment to clarify the dynamic change support for rsync-timeout-ms and
throttle-bytes-per-second to avoid confusion about where and how these settings can be
dynamically adjusted.** [tests/assets/default.conf [485-488]](https://github.com/OpenAtomFoundation/pika/pull/2753/files#diff-368052a10bd7bbc06037a29195afadda767a5777746d659145153ae429998697R485-R488) ```diff -# [Dynamic Change Supported] send command 'config set throttle-bytes-per-second new_value' to SLAVE NODE can dynamically adjust rsync rate during full sync(use config rewrite can persist the changes). -# [Dynamic Change Supported] similar to throttle-bytes-per-second, rsync-timeout-ms can be dynamically changed by configset command +# [Dynamic Change Supported] Use 'config set throttle-bytes-per-second new_value' on SLAVE NODE to dynamically adjust rsync rate during full sync. Use 'config rewrite' to persist changes. +# [Dynamic Change Supported] Use 'config set rsync-timeout-ms new_value' on SLAVE NODE to dynamically adjust rsync timeout during full sync. ```
Suggestion importance[1-10]: 7 Why: The suggestion improves the clarity of comments regarding dynamic changes to `rsync-timeout-ms` and `throttle-bytes-per-second`, which enhances the readability and usability of the configuration file.
7
Modify the GeoRadiusCmd::DoInitial() method to set sorting dynamically ___ **In the GeoRadiusCmd::DoInitial() method, ensure that the range_.sort is set based on a
condition or configuration rather than being hardcoded to Asc. This would allow for
dynamic sorting based on user input or other conditions.** [src/pika_geo.cc [446]](https://github.com/OpenAtomFoundation/pika/pull/2753/files#diff-7cd4c5c2968ce34090b73e38fec6d9265edb8173775149ee92c7b97ccc244261R446-R446) ```diff -range_.sort = Asc; +if (userInputSpecifiesAscending) { + range_.sort = Asc; +} else { + range_.sort = Desc; +} ```
Suggestion importance[1-10]: 6 Why: The suggestion improves flexibility by allowing dynamic sorting based on conditions. However, it assumes the existence of a variable or condition (`userInputSpecifiesAscending`) that is not present in the provided code.
6
Simplify list comparison using set operations ___ **Refactor the compare_lists function to use built-in Tcl commands for set operations, which
can simplify the code and potentially improve performance.** [tests/unit/type/geo.tcl [29-43]](https://github.com/OpenAtomFoundation/pika/pull/2753/files#diff-f19f97831168e4ce4a303dafe51e672a64f8a62dbf8a40dcd164565ae591f95aR29-R43) ```diff proc compare_lists {List1 List2} { - set DiffList {} - foreach Item $List1 { - if {[lsearch -exact $List2 $Item] == -1} { - lappend DiffList $Item - } - } - foreach Item $List2 { - if {[lsearch -exact $List1 $Item] == -1} { - if {[lsearch -exact $DiffList $Item] == -1} { - lappend DiffList $Item - } - } - } - return $DiffList + return [lsort [set difference [set union $List1 $List2] [set intersection $List1 $List2]]] } ```
Suggestion importance[1-10]: 6 Why: Simplifying the list comparison using set operations can improve readability and performance, but the improvement is relatively minor.
6
Best practice
Add a default value comment for sync-binlog-thread-num ___ **Add a default value for sync-binlog-thread-num in the configuration file to ensure there
is a fallback and clear starting point when setting up new configurations.** [tests/assets/default.conf [46]](https://github.com/OpenAtomFoundation/pika/pull/2753/files#diff-368052a10bd7bbc06037a29195afadda767a5777746d659145153ae429998697R46-R46) ```diff -sync-binlog-thread-num : 1 +# Default value for sync-binlog-thread-num, adjust as necessary based on the number of databases +sync-binlog-thread-num : 1 # Default value ```
Suggestion importance[1-10]: 5 Why: Adding a comment about the default value for `sync-binlog-thread-num` is a good practice for clarity, but it is a minor improvement compared to functional changes.
5