apache / rocketmq

Apache RocketMQ is a cloud native messaging and streaming platform, making it simple to build event-driven applications.
https://rocketmq.apache.org/
Apache License 2.0
21.13k stars 11.64k forks source link

MessageStore IndexService performance improve #262

Closed lindzh closed 6 years ago

lindzh commented 6 years ago

The issue tracker is ONLY used for bug report and feature request. Keep in mind, please check whether there is an existing same report before your raise a new one.

Alternately (especially if your communication is not a bug report), you can send mail to our mailing lists. We welcome any friendly suggestions, bug fixes, collaboration and other improvements.

Please ensure that your bug report is clear and that it is complete. Otherwise, we may be unable to understand it or to reproduce it, either of which would prevent us from fixing the bug. We strongly recommend the report(bug report or feature request) could include some hints as the following:

FEATURE REQUEST

  1. Please describe the feature you are requesting. https://issues.apache.org/jira/projects/ROCKETMQ/issues/ROCKETMQ-267?filter=allopenissues When using pagecache for commit log and index service,There is more latency in send request,replace it with randomAccessFile read and write,there is a good improve for send.

  2. Provide any additional detail on your proposed use case for this feature.

Performance improve data, old version is using map pagecache for Index read and write while new version is using randomAccessFile read and write. And [with direct io write] aims to use direct io.

24C96G  Linux rs6f15396.et2sqa 2.6.32-220.23.2.el6.x86_64 #1 SMP Mon Jan 28 17:12:52 CST 2013 x86_64 x86_64 x86_64 GNU/Linux

[old-version]
[<=0ms]:13222248 [0~10ms]:199848 [10~50ms]:5 [50~100ms]:8 [100~200ms]:0 [200~500ms]:0 [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0
[<=0ms]:10213863 [0~10ms]:195070 [10~50ms]:104 [50~100ms]:88 [100~200ms]:8 [200~500ms]:4 [500ms~1s]:4 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0
[<=0ms]:10025096 [0~10ms]:197015 [10~50ms]:192 [50~100ms]:96 [100~200ms]:0 [200~500ms]:0 [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0
[<=0ms]:10218673 [0~10ms]:201210 [10~50ms]:151 [50~100ms]:48 [100~200ms]:4 [200~500ms]:0 [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0
[<=0ms]:10401378 [0~10ms]:198450 [10~50ms]:172 [50~100ms]:72 [100~200ms]:0 [200~500ms]:0 [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0
[<=0ms]:10498649 [0~10ms]:200961 [10~50ms]:124 [50~100ms]:60 [100~200ms]:0 [200~500ms]:0 [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0
[<=0ms]:10565265 [0~10ms]:204717 [10~50ms]:24 [50~100ms]:64 [100~200ms]:0 [200~500ms]:0 [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0
[<=0ms]:10228801 [0~10ms]:194281 [10~50ms]:150 [50~100ms]:142 [100~200ms]:0 [200~500ms]:4 [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0
[<=0ms]:9844542 [0~10ms]:196146 [10~50ms]:176 [50~100ms]:124 [100~200ms]:4 [200~500ms]:0 [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0
[<=0ms]:9586154 [0~10ms]:187757 [10~50ms]:344 [50~100ms]:120 [100~200ms]:0 [200~500ms]:0 [500ms~1s]:4 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0

Time           ---cpu-- ---mem-- ---tcp-- -----traffic---- --sda--- --sdb--- --sdc--- --sdd--- --sde--- --sdf--- --sdg--- --sdh--- --sdi--- --sdj--- --sdk--- --sdl--- --sdm---  ---load-
29/03/18-16:52   0.15     8.93    23.27     1.5K    1.0K     1.73     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00      0.05
29/03/18-16:53  11.46    17.92    20.13     1.4K  931.00    49.53     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00      2.64
29/03/18-16:54  21.07    17.92    19.80     1.6K    1.2K    99.64     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00      5.11
29/03/18-16:55  20.92    17.92     2.34     1.3K    1.4K    99.71     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00      6.15
29/03/18-16:56  20.74    17.93     0.43     1.4K    1.4K   100.00     0.05     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00      7.03
29/03/18-16:57  21.39    17.94     1.09     1.4K    1.5K    99.80     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00      7.26
29/03/18-16:58  20.69    17.95     0.47     1.3K    1.3K    99.83     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00      7.27
29/03/18-16:59  20.88    17.97     0.49     1.3K    1.2K    99.80     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00      7.17
29/03/18-17:00  20.92    17.98     0.46     1.3K    1.4K    99.48     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00      7.12
29/03/18-17:01  20.71    17.98     0.47     1.3K    1.4K    99.03     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00      7.00
29/03/18-17:02  17.42    17.98     0.70     1.7K   10.2K    85.76     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00      6.41
29/03/18-17:03   3.91    17.98     0.40     1.5K    3.0K    14.05     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00      3.16
29/03/18-17:04   4.08    17.98     1.05     1.5K    1.5K    14.09     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00      2.22

[device performance]

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00 12300.00    0.00  237.00     0.00    48.97   423.16     1.25    5.18   4.19  99.30
sda               0.00 12864.00    0.00  266.00     0.00    51.29   394.89     1.31    4.95   3.73  99.20
sda               0.00 13762.00    0.00  263.00     0.00    54.78   426.59     1.46    5.58   3.78  99.50
sda               0.00 13545.00    0.00  242.00     0.00    53.85   455.74     1.31    5.45   4.09  98.90

[new-version]
[<=0ms]:13974308 [0~10ms]:199764 [10~50ms]:7 [50~100ms]:4 [100~200ms]:0 [200~500ms]:0 [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0
[<=0ms]:14625146 [0~10ms]:195935 [10~50ms]:0 [50~100ms]:12 [100~200ms]:4 [200~500ms]:0 [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0
[<=0ms]:13871723 [0~10ms]:186522 [10~50ms]:0 [50~100ms]:120 [100~200ms]:20 [200~500ms]:0 [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0
[<=0ms]:13451393 [0~10ms]:185738 [10~50ms]:0 [50~100ms]:184 [100~200ms]:4 [200~500ms]:0 [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0
[<=0ms]:13424794 [0~10ms]:185283 [10~50ms]:0 [50~100ms]:168 [100~200ms]:20 [200~500ms]:0 [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0
[<=0ms]:13889762 [0~10ms]:190237 [10~50ms]:0 [50~100ms]:92 [100~200ms]:12 [200~500ms]:0 [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0
[<=0ms]:13953812 [0~10ms]:192415 [10~50ms]:0 [50~100ms]:52 [100~200ms]:16 [200~500ms]:0 [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0
[<=0ms]:13867076 [0~10ms]:191798 [10~50ms]:0 [50~100ms]:68 [100~200ms]:12 [200~500ms]:0 [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0
[<=0ms]:12797578 [0~10ms]:177276 [10~50ms]:23 [50~100ms]:265 [100~200ms]:28 [200~500ms]:0 [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0
[<=0ms]:4323033 [0~10ms]:59581 [10~50ms]:12 [50~100ms]:48 [100~200ms]:28 [200~500ms]:0 [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0

Time           ---cpu-- ---mem-- ---tcp-- -----traffic---- --sda--- --sdb--- --sdc--- --sdd--- --sde--- --sdf--- --sdg--- --sdh--- --sdi--- --sdj--- --sdk--- --sdl--- --sdm---  ---load-
29/03/18-17:26   0.38    22.26     0.14     3.4K    4.6K     1.60     0.51     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00      0.19
29/03/18-17:27   0.37    22.35     0.11   130.1K    8.1K     1.84     0.50     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00      0.10
29/03/18-17:28   0.28    22.35     0.14     3.7K    6.5K     1.95     0.61     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00      0.04
29/03/18-17:29  14.59    31.34     0.21     4.0K    5.0K    63.04     0.60     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00      3.03
29/03/18-17:30  21.45    31.29     0.13     3.9K    5.0K    99.25     0.58     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00      5.35
29/03/18-17:31  20.79    31.30     0.13     6.6K    7.7K   100.00     0.61     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00      6.48
29/03/18-17:32  20.75    31.31     0.22    38.2K    6.7K    99.67     0.54     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00      6.51
29/03/18-17:33  20.70    31.31     0.14     3.9K    5.0K    99.70     0.61     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00      6.66
29/03/18-17:34  20.70    31.32     0.14     4.0K    4.9K    99.73     0.59     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00      6.47
29/03/18-17:35  20.85    31.34     7.01     4.5K    5.1K    99.61     0.59     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00      6.67
29/03/18-17:36  20.92    31.34    20.98    35.9K   10.1K    98.28     3.31     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00      6.78
29/03/18-17:37  20.90    31.33    14.69    95.9K   14.1K    98.75     5.21     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00      7.10
29/03/18-17:38  20.19    31.31    21.47    79.3K   16.9K    99.55     6.30     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00      7.10
29/03/18-17:39   8.10    22.31    21.38    57.9K   16.8K    39.35     7.67     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00      4.06

[device performance]

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00 15422.00    0.00  325.00     0.00    59.09   372.33     1.59    4.98   3.06  99.30
sda               0.00 12941.00    0.00  322.00     0.00    54.23   344.92     1.94    5.89   3.09  99.60
sda               0.00 13319.00    0.00  317.00     0.00    53.27   344.15     2.11    6.51   3.14  99.40

with direct io write [direct io fix](https://github.com/lindzh/rocketmq/blob/index_direct_io/store/src/main/java/org/apache/rocketmq/store/index/IndexFile.java)

[Device perf]
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00 14543.00 1271.00 1411.00     4.96    54.57    45.46    13.17    5.05   0.37 100.00
sda               0.00 13314.00 1438.00 1579.00     5.62    56.45    42.13    12.05    3.92   0.33  99.90
sda               0.00 10633.00 1207.00 1364.00     4.71    60.58    52.01    13.48    5.63   0.39 100.00
sda               0.00 15780.00 1379.00 1527.00     5.39    57.97    44.65    12.06    4.20   0.34 100.00
sda               0.00 11500.00 1874.00 2030.00     7.32    62.47    36.61    12.73    3.28   0.26 100.00
sda               0.00 12055.00 1421.00 1554.00     5.55    52.41    39.90    11.12    3.75   0.34 100.00

[write perf]
[<=0ms]:13615826 [0~10ms]:198861 [10~50ms]:4 [50~100ms]:12 [100~200ms]:0 [200~500ms]:0 [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0
[<=0ms]:14935568 [0~10ms]:197179 [10~50ms]:6 [50~100ms]:8 [100~200ms]:4 [200~500ms]:0 [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0
[<=0ms]:14880945 [0~10ms]:197794 [10~50ms]:0 [50~100ms]:0 [100~200ms]:0 [200~500ms]:0 [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0
[<=0ms]:14754942 [0~10ms]:197580 [10~50ms]:0 [50~100ms]:4 [100~200ms]:0 [200~500ms]:0 [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0
[<=0ms]:14633515 [0~10ms]:193760 [10~50ms]:16 [50~100ms]:16 [100~200ms]:24 [200~500ms]:0 [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0
[<=0ms]:14254780 [0~10ms]:187715 [10~50ms]:12 [50~100ms]:24 [100~200ms]:64 [200~500ms]:8 [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0
[<=0ms]:14730283 [0~10ms]:194008 [10~50ms]:12 [50~100ms]:4 [100~200ms]:20 [200~500ms]:4 [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0
[<=0ms]:6990788 [0~10ms]:93172 [10~50ms]:12 [50~100ms]:20 [100~200ms]:24 [200~500ms]:4 [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0
  1. Indicate the importance of this issue to you (blocker, must-have, should-have, nice-to-have). Are you currently using any workarounds to address this issue?

should-have

  1. If there are some sub-tasks using -[] for each subtask and create a corresponding issue to map to the sub task:
    • no sub tasks
zhouxinyu commented 6 years ago
Performance improve data, old version is using pagecache for Index read and write while new version is using randomAccessFile read and write.

The randomAccessFile also will go through the pagecache.

And all the store file should use the MappedFile.

lindzh commented 6 years ago

@zhouxinyu Yeah,I am wandering if there is a good improve for this change. Finally I change it with direct io read and write for IndexFile, it gained 14935568 at one minute.

vongosling commented 6 years ago

I will close the pr, if you happened to the same question, please let me know.