apache / incubator-pegasus

Apache Pegasus - A horizontally scalable, strongly consistent and high-performance key-value store
https://pegasus.apache.org/
Apache License 2.0
1.98k stars 313 forks source link

Release 2.4.0 #1032

Closed foreverneverer closed 2 years ago

foreverneverer commented 2 years ago

New Module

From this version, more project module will join Apache Pegasus Project. In this version, the following projects are included:

New architecture

In this version, we remove the shared log to enhance the pegasus performance, Related pull request as follow:

New Feature

Replica-factor update

Supporting flexible replica count. In the past, the replica factor was Immutable once one table was created. In current version, user can dynamically adjust the factor of specified table. Related pull request as follow:

Read Request Limiter

In the past, we only support write limiter, in this version, we add the supporting for read:

Jemalloc Support

Build Feature

We have made some restrictions on the compilation environment and support MacOS and aarcch64:

New BatchGetAPI

In the past, the batchGet implement based the singleGet, the latest version will aggregate different request first berfore sending, it will improve the performace:

Feature enhancement

Bulkload

We improve bulkload feature to reduce the io-load of downloading and ingesting, besides, we offer better interfaces and failure handling logic, the related pull request as follow:

Duplication

In the past, duplication has some shortcoming: It depends remote filesystem to sync the checkpoint; The synchronization of plog data only sends a single mutation at each RPC. In this version, we enhance the above problem(the detail design see https://github.com/apache/incubator-pegasus/issues/892), related pull request as follows:

PerfCounter

In the version, we support new metric implement to optimize performance:

Manual Compaction

Learn with NFS

To reduce the impact of data migration for IO-LOAD and ensure the migration rate, our data transmission supports disk level speed limits:

Latency Tracer

The latest latency tracer support perf-counter and fix some bugs:

Other important

Java Client

Go Client

Python Client

Admin Cli

Pegasus Docker

Code Refactor

Common

Performance

In this benchmark, we use the new machine, for the result is more reasonable, we re-run the Pegasus Server 2.3:

Pegasus Server 2.3

Case client and thread R:W R-QPS R-Avg R-P99 W-QPS W-Avg W-P99
Write Only 3 clients * 15 threads 0:1 - - - 48805 919 2124
Read Only 3 clients * 50 threads 1:0 370068 402 988 - - -
Read Write 3 clients * 30 threads 1:1 50762 532 5859 50759 1233 4162
Read Write 3 clients * 15 threads 1:3 14471 443 3869 43425 884 1899
Read Write 3 clients * 15 threads 1:30 1583 473 3432 47551 928 2066
Read Write 3 clients * 30 threads 3:1 119093 406 3367 39693 1035 2581
Read Write 3 clients * 50 threads 30:1 322904 435 1034 10762 882 1392

Pegasus Server 2.4

Case client and thread R:W R-QPS R-Avg R-P99 W-QPS W-Avg W-P99
Write Only 3 clients * 15 threads 0:1 - - - 56953 787 1786
Read Only 3 clients * 50 threads 1:0 360642 413 984 - - -
Read Write 3 clients * 30 threads 1:1 62572 464 5274 62561 985 3764
Read Write 3 clients * 15 threads 1:3 16844 372 3980 50527 762 1551
Read Write 3 clients * 15 threads 1:30 1861 381 3557 55816 790 1688
Read Write 3 clients * 30 threads 3:1 140484 351 3277 46822 856 2044
Read Write 3 clients * 50 threads 30:1 336106 419 1221 11203 763 1276

Config-Update

+ [pegasus.server]
+ rocksdb_max_log_file_size = 8388608
+ rocksdb_log_file_time_to_roll = 86400
+ rocksdb_keep_log_file_num = 32

+ [replication]
+ plog_force_flush = false

- mutation_2pc_min_replica_count = 2
+ mutation_2pc_min_replica_count = 0 # 0 means it's value based table max replica count

+ enable_direct_io = false # Whether to enable direct I/O when download files from hdfs, default false
+ direct_io_buffer_pages = 64 # Number of pages we need to set to direct io buffer, default 64 which is recommend in my test.
+ max_concurrent_manual_emergency_checkpointing_count = 10

+ enable_latency_tracer_report = false
+ latency_tracer_counter_name_prefix = trace_latency

+ hdfs_read_limit_rate_mb_per_sec = 200
+ hdfs_write_limit_rate_mb_per_sec = 200

+ duplicate_log_batch_bytes = 0 # 0 means no batch before sending

+ [nfs]
- max_copy_rate_megabytes = 500
+ max_copy_rate_megabytes_per_disk = 0
- max_send_rate_megabytes = 500
+ max_send_rate_megabytes_per_disk = 0

+ [meta_server]
+ max_reserved_dropped_replicas = 0
+ bulk_load_verify_before_ingest = false
+ bulk_load_node_max_ingesting_count = 4
+ bulk_load_node_min_disk_count = 1
+ enable_concurrent_bulk_load = false
+ max_allowed_replica_count = 5
+ min_allowed_replica_count = 1

+ [task.LPC_WRITE_REPLICATION_LOG_SHARED]
+ enable_trace = true # true will mark the task will be traced latency if open global trace

Contributors

acelyc111 cauchy1988 empiredan foreverneverer GehaFearless GiantKing happydongyaoyao hycdong levy5307 lidingshengHHU neverchanje padmejin Smityz totalo WHBANG xxmazha ZhongChaoqiang

empiredan commented 2 years ago

Several problems have been found according to the checklist for Incubator release:

Thus some PRs have been committed to fix these problems as follows:

These PRs should be cherry-picked to v2.4 to meet the requirements for Incubator release.

foreverneverer commented 2 years ago

Since we have cherry-pick more commit into v2.4, which involved the cmake module. Suggested by @acelyc111, I re-run benchmark as follow:

Pegasus Server 2.4

Case client and thread R:W R-QPS R-Avg R-P99 W-QPS W-Avg W-P99
Write Only 3 clients * 15 threads 0:1 - - - 55490 808 3540
Read Only 3 clients * 50 threads 1:0 361112 414 997 - - -
Read Write 3 clients * 30 threads 1:1 63581 469 5447 63580 939 4959
Read Write 3 clients * 15 threads 1:3 16559 396 4228 49664 769 3987
Read Write 3 clients * 15 threads 1:30 1730 413 3669 51966 849 4735
Read Write 3 clients * 30 threads 3:1 135091 376 3007 45304 842 4753
Read Write 3 clients * 50 threads 30:1 319519 444 1442 10643 819 2691

For some reasons, I cannot run under centos7 5.4.54-2.0.4.std7c.el7.x86_ 64, which may also lead to some differences in results from the last, I will retest some previous versions before the official release.

acelyc111 commented 2 years ago

Some more license issues have been resolved:

acelyc111 commented 2 years ago

https://github.com/apache/incubator-pegasus/releases/tag/v2.4.0