CentaurusInfra / regionless-storage-service

A geo-distributed regionless metadata storage service
Apache License 2.0
0 stars 4 forks source link

0729 test run #36

Closed h-w-chen closed 2 years ago

h-w-chen commented 2 years ago

test config

redis persistence is disabled (save "")

export NUM_OF_SI=6
export RKV_ROOT_DISK_VOLUME=100
export SI_ROOT_DISK_VOLUME=100
export SI_INSTANCE_TYPE=t2.xlarge
export RKV_INSTANCE_TYPE=t2.2xlarge
export JAEGER_INSTANCE_TYPE=t2.2xlarge
export JAEGER_ROOT_DISK_VOLUME=200
export YCSB_INSTANCE_TYPE=t2.2xlarge
export YCSB_ROOT_DISK_VOLUME=40

records: 5M k-v payload: 5KB value

test procedure

load test only workloada setting

threadcount=4
fieldlength=160   #intended 500; due to the encoding, 160 length would yield about 500 payload
recordcount=5000000
operationcount=5000000

test result

ycsb log:

INSERT - Takes(s): 10840.0, Count: 3188140, OPS: 294.1, Avg(us): 13548, Min(us): 4082, Max(us): 1046015, 99th(us): 22239, 99.9th(us): 29471, 99.99th(us): 223231
INSERT - Takes(s): 10850.0, Count: 3191224, OPS: 294.1, Avg(us): 13548, Min(us): 4082, Max(us): 1046015, 99th(us): 22239, 99.9th(us): 29471, 99.99th(us): 223231
INSERT - Takes(s): 10860.0, Count: 3194339, OPS: 294.1, Avg(us): 13547, Min(us): 4082, Max(us): 1046015, 99th(us): 22239, 99.9th(us): 29471, 99.99th(us): 223231
INSERT - Takes(s): 10870.0, Count: 3197327, OPS: 294.1, Avg(us): 13547, Min(us): 4082, Max(us): 1046015, 99th(us): 22223, 99.9th(us): 29471, 99.99th(us): 223231
INSERT - Takes(s): 10880.0, Count: 3200144, OPS: 294.1, Avg(us): 13547, Min(us): 4082, Max(us): 1046015, 99th(us): 22223, 99.9th(us): 29455, 99.99th(us): 223231
... // approaching to 4M records (where redis is almost exhausting its memory), noticing OPS is actually very low (<1 ops at this moment)
INSERT - Takes(s): 13820.0, Count: 3931533, OPS: 284.5, Avg(us): 13969, Min(us): 4082, Max(us): 19005439, 99th(us): 22079, 99.9th(us): 29439, 99.99th(us): 1022463
INSERT - Takes(s): 13830.0, Count: 3931539, OPS: 284.3, Avg(us): 13986, Min(us): 4082, Max(us): 22298623, 99th(us): 22079, 99.9th(us): 29455, 99.99th(us): 1022463
INSERT - Takes(s): 13840.0, Count: 3931541, OPS: 284.1, Avg(us): 13993, Min(us): 4082, Max(us): 22298623, 99th(us): 22079, 99.9th(us): 29455, 99.99th(us): 1022975
INSERT - Takes(s): 13850.0, Count: 3931542, OPS: 283.9, Avg(us): 13999, Min(us): 4082, Max(us): 22790143, 99th(us): 22079, 99.9th(us): 29455, 99.99th(us): 1022975
... // even no ops sometimes
INSERT - Takes(s): 14550.0, Count: 3932125, OPS: 270.2, Avg(us): 14673, Min(us): 4082, Max(us): 33439743, 99th(us): 22111, 99.9th(us): 29855, 99.99th(us): 5308415
INSERT - Takes(s): 14560.0, Count: 3932125, OPS: 270.1, Avg(us): 14673, Min(us): 4082, Max(us): 33439743, 99th(us): 22111, 99.9th(us): 29855, 99.99th(us): 5308415
INSERT - Takes(s): 14570.0, Count: 3932125, OPS: 269.9, Avg(us): 14673, Min(us): 4082, Max(us): 33439743, 99th(us): 22111, 99.9th(us): 29855, 99.99th(us): 5308415
INSERT - Takes(s): 14580.0, Count: 3932125, OPS: 269.7, Avg(us): 14673, Min(us): 4082, Max(us): 33439743, 99th(us): 22111, 99.9th(us): 29855, 99.99th(us): 5308415
...
INSERT - Takes(s): 14640.0, Count: 3932125, OPS: 268.6, Avg(us): 14673, Min(us): 4082, Max(us): 33439743, 99th(us): 22111, 99.9th(us): 29855, 99.99th(us): 5308415
INSERT - Takes(s): 14650.0, Count: 3932126, OPS: 268.4, Avg(us): 14682, Min(us): 4082, Max(us): 33980415, 99th(us): 22111, 99.9th(us): 29855, 99.99th(us): 5320703
INSERT - Takes(s): 14660.0, Count: 3932129, OPS: 268.2, Avg(us): 14708, Min(us): 4082, Max(us): 34570239, 99th(us): 22111, 99.9th(us): 29871, 99.99th(us): 5345279
INSERT - Takes(s): 14670.0, Count: 3932133, OPS: 268.0, Avg(us): 14722, Min(us): 4082, Max(us): 34570239, 99th(us): 22111, 99.9th(us): 29871, 99.99th(us): 5402623
INSERT - Takes(s): 14680.0, Count: 3932136, OPS: 267.9, Avg(us): 14729, Min(us): 4082, Max(us): 34570239, 99th(us): 22111, 99.9th(us): 29871, 99.99th(us): 5423103
...
INSERT - Takes(s): 17050.0, Count: 3933443, OPS: 230.7, Avg(us): 17128, Min(us): 4082, Max(us): 34570239, 99th(us): 22191, 99.9th(us): 31215, 99.99th(us): 12156927
INSERT - Takes(s): 17060.0, Count: 3933451, OPS: 230.6, Avg(us): 17144, Min(us): 4082, Max(us): 34570239, 99th(us): 22191, 99.9th(us): 31231, 99.99th(us): 12230655
INSERT - Takes(s): 17070.0, Count: 3933454, OPS: 230.4, Avg(us): 17155, Min(us): 4082, Max(us): 34570239, 99th(us): 22191, 99.9th(us): 31231, 99.99th(us): 12230655

rkv memory usage when system is about 4M records

              total        used        free      shared  buff/cache   available
Mem:            31G        1.7G         28G        844K        1.5G         29G
Swap:            0B          0B          0B

jaeger cpu and disk usage when system is about 4M recods

               total        used        free      shared  buff/cache   available
Mem:            31Gi       7.8Gi        22Gi       1.0Mi       926Mi        23Gi
Swap:             0B          0B          0B

Filesystem      Size  Used Avail Use% Mounted on
/dev/root       194G  2.4G  192G   2% /

ycsb client cpu and disk usage when system is about 4M recods

               total        used        free      shared  buff/cache   available
Mem:            31Gi       334Mi        30Gi       0.0Ki       788Mi        30Gi
Swap:             0B          0B          0B

Filesystem      Size  Used Avail Use% Mounted on
/dev/root        39G  4.3G   35G  11% /

observations

h-w-chen commented 2 years ago

When system is quite slow (approaching 4M records), one of the redis SI cpu utilization is high, as illustrated. The other 5 SI are having low cpu usage meanwhile (<2%) Screenshot from 2022-07-29 15-57-59

this machine does not respond to ssh login; redis-cli is able to respond but quite slow (dbsize cmd delay is ~2.4s).

pdgetrf commented 2 years ago

When system is quite slow (approaching 4M records), one of the redis SI cpu utilization is high, as illustrated. The other 5 SI are having low cpu usage meanwhile (<2%) Screenshot from 2022-07-29 15-57-59

this machine does not respond to ssh login; redis-cli is able to respond but quite slow (dbsize cmd delay is ~2.4s).

possibly started memory thrashing

pdgetrf commented 2 years ago

no issue before memory started to run out right?

h-w-chen commented 2 years ago

nope; system is still running - though slowly

pdgetrf commented 2 years ago

at the end of this run, please record the memory usage of rkv server too for reference

pdgetrf commented 2 years ago

also disk usage of yaeger and ycsb

pdgetrf commented 2 years ago

thanks for gathering those metrics.

image

this much used for index surprises me a bit.

h-w-chen commented 2 years ago

closed; milestone has paased.