SJTU-IPADS / xstore

Fast RDMA-based Ordered Key-Value Store using Remote Learned Cache
112 stars 26 forks source link

About the machine setup #7

Closed baotonglu closed 2 years ago

baotonglu commented 2 years ago

Hi Xingda,

Thanks for your quick response. I really appreciate it. I am a newbie in distributed computing, so I am stuck at the initial machine setup for this project.

Specifically, I have three machines with different IP addresses and they are connected via IB switch. I want to use one machine as the server and the other two as clients. But now I don't know how to modify the scripts and configuration files (e.g., /ae_scripts/ycsba.toml, cs.toml) to make this project work on my cluster.

BTW, I am using the 'legacy' branch of this project since the 'master' branch seems have less information about the benchmark.

Could you give me some specific guidance about the machine setup? Thank you very much.

Best, Baotong

wxdwfc commented 2 years ago

Hi boating,

Thanks for your interests in XStore. For how we use the configuration files (in tool) to run XStore, you can first check the README.md in our legacy version (https://github.com/SJTU-IPADS/xstore/blob/legacy/README.md), which has detailed instructions. If you have any further questions, feel free to contact me.

Best, Xingda

baotonglu commented 2 years ago

Hi Xingda,

I actually have read that README.md and I have successfully built the project by executing "make fserver; make ycsb; make micro; make master".

But I am stuck at the Section Run experiments with bootstrap.py. For example, for the following setup, what values of 'host' and 'path' I should fill on my cluster? I only know the IP addresses of the machines.

[[pass]] host = "val10" path = "/cock/fstore" cmd = "./fserver --help"

[[pass]] host = "val11" path = "/cock/fstore" cmd = "./ycsb --help"

Best, Baotong

wxdwfc commented 2 years ago

Basically, the bootstrap.py will translate a [[pass]] to the following and then execute:

ssh host "cd $path$; $cmd$"

So, be sure you can use ssh to execute command on your machines.

baotonglu commented 2 years ago

Oh, I got it. Thanks, it's really helpful. So I need to first compile this project on each machine, right? Currently, I only compiled it on one machine.

wxdwfc commented 2 years ago

Yes, you should compile it on all the machines (or sync the binaries to these machines, e.g., with rsync) or put it on an NFS.

baotonglu commented 2 years ago

Hi Xingda,

The content of my configuration file 'sample.toml' is as follows where "10.150.240.28" is the IP address of my remote server:

[[pass]]
host = "10.150.240.28"
path = "/home/v-baotonglu/xstore"
cmd = "./fserver --help"

But I get the error below when running python3 ./bootstrap.py -f sample.toml.

wxd 123 None
None
(execute cmd @10.150.240.28:22345 ./fserver --help
connect <paramiko.config.SSHConfig object at 0x7f3201519b00>
[pre execute] connect to 10.150.240.28:22345 error:  'proxycommand'
Traceback (most recent call last):
  File "./bootstrap.py", line 415, in <module>
    main()
  File "./bootstrap.py", line 410, in main
    if p.print_one():
  File "./bootstrap.py", line 25, in print_one
    if self.c.recv_ready():
AttributeError: 'tuple' object has no attribute 'recv_ready

Even if I change the value of host in 'sample.toml' as v-baotonglu@10.150.240.28 -p 22345, it still cannot work.

Could you give some suggestions? Thanks.

wxdwfc commented 2 years ago

Have you add

user=xxx
pwd=xxx

to the toml?

The user and pwd are the user name and the corresponding password that is capable of executing ssh commands on the server.

baotonglu commented 2 years ago

It still cannot work. Should I specific the port for ssh? In normal login, I use the port '22345'. If I need to specify the port, how to do it in toml?

wxdwfc commented 2 years ago

Currently, our script doesn't support specify the port. For you case, I suggest manually run the commands (e.g. fserver ) on the machines.

baotonglu commented 2 years ago

OK, thanks.

Another question, for the 'user' and 'pwd' in toml, do you mean the 'user' & 'pwd' of the remote server? Or it should be the 'user' & 'pwd' of the server which runs bootstrap.py?

BTW, if I change the port number to the default one 22, then your script will work, right?

wxdwfc commented 2 years ago

The user and pwd are used for remote server. The toml works fine on our servers so we hope it will also work fine on yours.

baotonglu commented 2 years ago

Hi Xingda,

After slight modifications to your script, now it could work by specifying the port. I have successfully run the "sample.toml".

I am trying to run the "sample-kv.toml". Although the server side runs well, it seems that the client threads do not (or fail to) send the request the server side. The following is the initial printed message I got when running "sample-kv.toml". Note that "10.150.240.30" is the server side and "10.150.240.28" is the client side.

@10.150.240.30 [main.cc:257] use configuration: Server config:
@10.150.240.30 using memory for leaf nodes: 20.0000 GB;
@10.150.240.30 allocated RDMA heap size: 8.0000 GB;
@10.150.240.30 server communication type: ud.
@10.150.240.30 of config file: server/config.toml
@10.150.240.30 [memory_util.hpp:37] huge page alloc failed!
@10.150.240.30 [memory_util.hpp:45] use default malloc for allocating page
@10.150.240.30 [main.cc:271] use page mem: 21474836480
@10.150.240.30 [allocator_master.hpp:50] allocator master register memory: 8587836416
@10.150.240.30 [main.cc:277] Memory layout of the server: | meta 0x0:0x0+2MB | page 0x200000:0x200000+20GB | Heap 0x500200400:0x500200400+7.99805GB |
@10.150.240.30 [main.cc:305] server wait for threads to join ...
@10.150.240.30 [main.cc:307] Start populating DB: ycsb with num: 100000000
@10.150.240.30 [worker.hpp:98] Server: 0 use nic id: 1
@10.150.240.30 
@10.150.240.28 [mem_region.hpp:68] region manager alloc memory: 0x7fc87ffff010
@10.150.240.28 [allocator_master.hpp:50] allocator master register memory: 2147483648
@10.150.240.28 [allocator_master.hpp:43] AllocatorMaster<0> inited multiple times
@10.150.240.28 [main.cc:131] starting threads, using workload: ycsbc
@10.150.240.28 [main.cc:137] wait for thread to be ready
@10.150.240.28 [rdma_ctrl.hpp:110] rdma ctrl started!
@10.150.240.30 [worker.hpp:98] Server: 1 use nic id: 1
@10.150.240.30 
@10.150.240.30 [worker.hpp:98] Server: 2 use nic id: 1
@10.150.240.30 
@10.150.240.30 [worker.hpp:98] Server: 3 use nic id: 1
@10.150.240.30 
@10.150.240.30 [worker.hpp:98] Server: 4 use nic id: 1
@10.150.240.30 
@10.150.240.30 [worker.hpp:98] Server: 5 use nic id: 1
@10.150.240.30 
@10.150.240.30 [worker.hpp:98] Server: 6 use nic id: 1
@10.150.240.30 
@10.150.240.30 [worker.hpp:98] Server: 7 use nic id: 1
@10.150.240.30 
@10.150.240.30 [worker.hpp:98] Server: 8 use nic id: 1
@10.150.240.30 
@10.150.240.30 [worker.hpp:98] Server: 9 use nic id: 1
@10.150.240.30 
@10.150.240.30 [worker.hpp:98] Server: 10 use nic id: 1
@10.150.240.30 
@10.150.240.30 [worker.hpp:98] Server: 11 use nic id: 1
@10.150.240.30 
@10.150.240.30 [main.cc:167] B+tree load done, leaf sz: 384 rdma base: 140502338891792
@10.150.240.30 [main.cc:183] start training!
@10.150.240.30 
@10.150.240.30 [model_config.hpp:29] load second stage num: 2
@10.150.240.30 
@10.150.240.30 [mod.hh:45] init XCache using 100000 sub models
@10.150.240.30 
@10.150.240.30 [worker.hpp:98] Server: 12 use nic id: 1
@10.150.240.30 
@10.150.240.30 [worker.hpp:98] Server: 13 use nic id: 1
@10.150.240.30 
@10.150.240.30 [worker.hpp:98] Server: 14 use nic id: 1
@10.150.240.30 
@10.150.240.30 [worker.hpp:98] Server: 15 use nic id: 1
@10.150.240.30 
@10.150.240.30 [worker.hpp:98] Server: 16 use nic id: 1
@10.150.240.30 
@10.150.240.30 [worker.hpp:98] Server: 17 use nic id: 1
@10.150.240.30 
@10.150.240.30 [worker.hpp:98] Server: 18 use nic id: 1
@10.150.240.30 
@10.150.240.30 [worker.hpp:98] Server: 19 use nic id: 1
@10.150.240.30 
@10.150.240.30 [worker.hpp:98] Server: 20 use nic id: 1
@10.150.240.30 
@10.150.240.30 [mod.hh:236] train dispatcher in : 3.92129e+06 msec using 100000000 keys
@10.150.240.30 
@10.150.240.30 [worker.hpp:98] Server: 21 use nic id: 1
@10.150.240.30 
@10.150.240.30 [worker.hpp:98] Server: 22 use nic id: 1
@10.150.240.30 
@10.150.240.30 [worker.hpp:98] Server: 23 use nic id: 1
@10.150.240.30 
@10.150.240.30 [mod.hh:292] average model responsible num: 1000; min: 999max: 1001

After the clint side printed the message "10.150.240.28 [rdma_ctrl.hpp:110] rdma ctrl started!", it did not print any message later (also the CPU utilization of the client thread dropped to zero observed from "htop"). However, the server side was always running and continuously printing messages.

Do you have any suggestions on this case?

Best, Baotong

wxdwfc commented 2 years ago

Can you post the sample-kv.toml to let me check?

The most common reason for client to stuck is that it failed to connect to the UD for the bootstrap. You can check the line 165 in the benchs/ycsb/clients.hpp if you are in the legacy branch.

baotonglu commented 2 years ago

Hi Xingda,

The following is the content of sample-kv.toml.

[[pass]]
host = "10.150.240.33"
path = "/home/v-baotonglu/xstore"
cmd = "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/v-baotonglu/xstore/deps/boost/lib:/opt/intel/oneapi/compiler/2022.0.2/linux/compiler/lib/ia32_lin:/opt/intel/oneapi/compiler/2022.0.2/linux/compiler/lib/intel64_lin; sleep 1; ./fserver -db_type ycsb --threads 24 --id 0 -ycsb_num=100000000 --step=2 --model_config=ae_scripts/ycsb-model.toml"

[[pass]]
host = "10.150.240.28"
path = "/home/v-baotonglu/xstore"
cmd = "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/v-baotonglu/xstore/deps/boost/lib:/opt/intel/oneapi/compiler/2022.0.2/linux/compiler/lib/ia32_lin:/opt/intel/oneapi/compiler/2022.0.2/linux/compiler/lib/intel64_lin; ./ycsb --threads 1 -concurrency 8 -workloads ycsbc -server_host 10.150.240.33 -server_port 22345 --id 1 -total_accts=100000000  -eval_type=sc --use_master=false"

You are correct. The client thread stuck at the while loop at line 165 of benchs/ycsb/clients.hpp. How could I solve that?

Thanks, Baotong

baotonglu commented 2 years ago

After removing the -server_port 22345 from the above sample-kv.toml. The client side could connect the server for the bootstrap but it incurs some assertion error and abort. The following are the prints after the client connects to the server where "10.150.240.28" is the client and "10.150.240.30" is the server.

@10.150.240.28 [clients.hpp:169] [YCSB] client #0 bootstrap connect done to server
@10.150.240.28 
@10.150.240.28 [single_op.hpp:129] poll till completion error: 10 remote access error
@10.150.240.28 [seralize.hh:211] page entries: 0
@10.150.240.28 [seralize.hh:212] Assertion! seq: 0 0
@10.150.240.28 
@10.150.240.30 [main.cc:472] 0 models trained in the past 2 seconds in model #1
@10.150.240.30 
@10.150.240.30 [main.cc:417] 0 models trained in the past 2 seconds
@10.150.240.30 
@10.150.240.28 bash: line 1: 19730 Aborted                 (core dumped) ./ycsb --threads 1 -concurrency 8 -workloads ycsbc -server_host 10.150.240.30 --id 1 -total_accts=100000000 -eval_type=sc --use_master=false
exit  10.150.240.28
@10.150.240.30 [main.cc:472] 0 models trained in the past 2 seconds in model #1
@10.150.240.30 
@10.150.240.30 [main.cc:350] server has [0] pages/sec; total alloced: 12500000
@10.150.240.30 
@10.150.240.30 [main.cc:357] B+Tree index sz: 653.943 MB; Tree depth: 8 and KV sz:5.10896 GB; estimated KV pairs: 112 M
@10.150.240.30 
@10.150.240.30 [main.cc:417] 0 models trained in the past 2 seconds
@10.150.240.30 
@10.150.240.30 [main.cc:472] 0 models trained in the past 2 seconds in model #1
@10.150.240.30 
@10.150.240.30 [main.cc:417] 0 models trained in the past 2 seconds
@10.150.240.30 
@10.150.240.30 [main.cc:472] 0 models trained in the past 2 seconds in model #1
@10.150.240.30 
@10.150.240.30 [main.cc:350] server has [0] pages/sec; total alloced: 12500000
@10.150.240.30 
@10.150.240.30 [main.cc:357] B+Tree index sz: 653.943 MB; Tree depth: 8 and KV sz:5.10896 GB; estimated KV pairs: 112 M
@10.150.240.30 
@10.150.240.30 [main.cc:417] 0 models trained in the past 2 seconds
@10.150.240.30 
@10.150.240.30 [main.cc:472] 0 models trained in the past 2 seconds in model #1
@10.150.240.30 
@10.150.240.30 [main.cc:417] 0 models trained in the past 2 seconds

Any suggestions?

wxdwfc commented 2 years ago

It's weird to see this problem: it means the client fail to fetch the learned cache from the server (which should never happen).

Later I see: @10.150.240.28 [single_op.hpp:129] poll till completion error: 10 remote access error which means that RDMA bootstrap is not ok (the client failed to use one-sided RDMA to read the server model (stored in a memory area) using one-sided RDMA). It should not happen on the code in the branch. The error indicates that client's read address / or remote key is wrong. Can you further check that?

baotonglu commented 2 years ago

I am trying to debug the code to see why the RDMA read is wrong.

First, the error @10.150.240.28 [single_op.hpp:129] poll till completion error: 10 remote access error happens at the line 61 inbootstrap.hh. Some values of the parameters are as follows.

based_addr = 140344498843664
submodels_addr = 140328407407936
idx = 0
submodel_sz = 144

The base_addr is equal to the value of remote_mr.buf in clients.hpp. But I am not sure how the value submodels_addr is gotten.

Also the above function is invoked when the client tries to get the first submodel from server, that is, line 286 in clients.hpp.

BTW, we tested the RDMA connection using the ib_write_bw -aF which shows that RDMA itself should have no problem. So I am so confused why the RDMA read could fail.

baotonglu commented 2 years ago

The following are the prints from the server side.

@10.150.240.28 [main.cc:257] use configuration: Server config:
@10.150.240.28 using memory for leaf nodes: 20.0000 GB;
@10.150.240.28 allocated RDMA heap size: 8.0000 GB;
@10.150.240.28 server communication type: ud.
@10.150.240.28 of config file: server/config.toml
@10.150.240.28 [memory_util.hpp:37] huge page alloc failed!
@10.150.240.28 [memory_util.hpp:45] use default malloc for allocating page
@10.150.240.28 [main.cc:271] use page mem: 21474836480
@10.150.240.28 [allocator_master.hpp:50] allocator master register memory: 8587836416
@10.150.240.28 [main.cc:277] Memory layout of the server: | meta 0x0:0x0+2MB | page 0x200000:0x200000+20GB | Heap 0x500200400:0x500200400+7.99805GB |
@10.150.240.28 [main.cc:305] server wait for threads to join ...
@10.150.240.28 [main.cc:307] Start populating DB: ycsb with num: 100000000
@10.150.240.28 [worker.hpp:98] Server: 0 use nic id: 1
@10.150.240.28 
@10.150.240.28 [worker.hpp:98] Server: 1 use nic id: 1
@10.150.240.28 
@10.150.240.28 [worker.hpp:98] Server: 2 use nic id: 1
@10.150.240.28 
@10.150.240.28 [worker.hpp:98] Server: 3 use nic id: 1
@10.150.240.28 
@10.150.240.28 [worker.hpp:98] Server: 4 use nic id: 1
@10.150.240.28 
@10.150.240.28 [worker.hpp:98] Server: 5 use nic id: 1
@10.150.240.28 
@10.150.240.28 [main.cc:167] B+tree load done, leaf sz: 384 rdma base: 140344498843664
@10.150.240.28 [main.cc:183] start training!
@10.150.240.28 
@10.150.240.28 [model_config.hpp:29] load second stage num: 2
@10.150.240.28 
@10.150.240.28 [mod.hh:45] init XCache using 100000 sub models
@10.150.240.28 
@10.150.240.28 [worker.hpp:98] Server: 6 use nic id: 1
@10.150.240.28 
@10.150.240.28 [worker.hpp:98] Server: 7 use nic id: 1
@10.150.240.28 
@10.150.240.28 [worker.hpp:98] Server: 8 use nic id: 1
@10.150.240.28 
@10.150.240.28 [worker.hpp:98] Server: 9 use nic id: 1
@10.150.240.28 
@10.150.240.28 [worker.hpp:98] Server: 10 use nic id: 1
@10.150.240.28 
@10.150.240.28 [worker.hpp:98] Server: 11 use nic id: 1
@10.150.240.28 
@10.150.240.28 [worker.hpp:98] Server: 12 use nic id: 1
@10.150.240.28 
@10.150.240.28 [worker.hpp:98] Server: 13 use nic id: 1
@10.150.240.28 
@10.150.240.28 [worker.hpp:98] Server: 14 use nic id: 1
@10.150.240.28 
@10.150.240.28 [worker.hpp:98] Server: 15 use nic id: 1
@10.150.240.28 
@10.150.240.28 [mod.hh:236] train dispatcher in : 3.9378e+06 msec using 100000000 keys
@10.150.240.28 
@10.150.240.28 [worker.hpp:98] Server: 16 use nic id: 1
@10.150.240.28 
@10.150.240.28 [worker.hpp:98] Server: 17 use nic id: 1
@10.150.240.28 
@10.150.240.28 [worker.hpp:98] Server: 18 use nic id: 1
@10.150.240.28 
@10.150.240.28 [worker.hpp:98] Server: 19 use nic id: 1
@10.150.240.28 
@10.150.240.28 [mod.hh:292] average model responsible num: 1000; min: 999max: 1001
@10.150.240.28 [mod.hh:294] model responsible um cdf: X = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,]
@10.150.240.28 Y = [999,999,999,999,999,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1001,1001,1001,1001,1001,]
@10.150.240.28 title = ""
@10.150.240.28 ylabel = "Y"
@10.150.240.28 xlabel = "X"
@10.150.240.28 
@10.150.240.28 
@10.150.240.28 [worker.hpp:98] Server: 20 use nic id: 1
@10.150.240.28 
@10.150.240.28 [worker.hpp:98] Server: 21 use nic id: 1
@10.150.240.28 
@10.150.240.28 [mod.hh:174] average error: 0.41597; min: 0max: 15
@10.150.240.28 [mod.hh:177] cdf: X = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,]
@10.150.240.28 Y = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,]
@10.150.240.28 title = ""
@10.150.240.28 ylabel = "Y"
@10.150.240.28 xlabel = "X"
@10.150.240.28 
@10.150.240.28 
@10.150.240.28 [mod.hh:179] average page entry: 126; min: 124max: 126
@10.150.240.28 [mod.hh:182] cdf: X = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,]
@10.150.240.28 Y = [126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,126,]
@10.150.240.28 title = ""
@10.150.240.28 ylabel = "Y"
@10.150.240.28 xlabel = "X"
@10.150.240.28 
@10.150.240.28 
@10.150.240.28 [mod.hh:183] average sz: 1022
@10.150.240.28 [mod.hh:185] average num: 1008; min: 1000max: 1008
@10.150.240.28 [mod.hh:188] cdf: X = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,]
@10.150.240.28 Y = [1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,]
@10.150.240.28 title = ""
@10.150.240.28 ylabel = "Y"
@10.150.240.28 xlabel = "X"
@10.150.240.28 
@10.150.240.28 
@10.150.240.28 [mod.hh:190] average time: 19.2947; min: 14max: 1705
@10.150.240.28 [mod.hh:193] cdf: X = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,]
@10.150.240.28 Y = [15,15,15,16,16,16,16,16,16,16,17,17,17,17,17,17,17,17,17,17,17,17,17,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,20,20,20,20,20,20,20,20,20,20,20,20,20,20,21,21,21,21,21,21,21,21,21,22,22,22,22,22,22,23,23,23,24,24,25,27,]
@10.150.240.28 title = ""
@10.150.240.28 ylabel = "Y"
@10.150.240.28 xlabel = "X"
@10.150.240.28 
@10.150.240.28 
@10.150.240.28 [xcache_learner.hh:32] trained model ratio: 1
@10.150.240.28 [xcache_learner.hh:35] Trained 100000 models using: 2.00267e+06msec; training thpt: 49933.3
@10.150.240.28 [xcache_learner.hh:40] santiy check key:  use model: 0; page table entries: 126
@10.150.240.28 [augmentor.hh:72] check model :1 whether need train: 126
@10.150.240.28 [xcache_learner.hh:48] 0 models augmented
@10.150.240.28 
@10.150.240.28 [main.cc:185] train done use: 1.43849e+07 msec
@10.150.240.28 [xcache_learner.hh:82] allocate serialize buf num : 100000
@10.150.240.28 
@10.150.240.28 [main.cc:195] xcache model num: 1.33514; page_table num: 96.1304
@10.150.240.28 [model_config.hpp:29] load second stage num: 2
@10.150.240.28 [main.cc:221] XCache ml sz: 1.71661MB, page table sz: 95.3674 MBtotal KV sz: 4.47035 GB
@10.150.240.28 [main.cc:227] index sz level #1: 0.000366211 MB
@10.150.240.28 [main.cc:227] index sz level #2: 0.00219727 MB
@10.150.240.28 [main.cc:227] index sz level #3: 0.0194092 MB
@10.150.240.28 
@10.150.240.28 [main.cc:227] index sz level #4: 0.158936 MB
@10.150.240.28 
@10.150.240.28 [main.cc:227] index sz level #5: 1.27625 MB
@10.150.240.28 
@10.150.240.28 [main.cc:227] index sz level #6: 10.2166 MB
@10.150.240.28 
@10.150.240.28 [main.cc:227] index sz level #7: 81.7419 MB
@10.150.240.28 
@10.150.240.28 [main.cc:227] index sz level #8: 653.946 MB
@10.150.240.28 [main.cc:309] Load DB done.
@10.150.240.28 [main.cc:311] bar wait
@10.150.240.28 
@10.150.240.28 [worker.hpp:98] Server: 22 use nic id: 1
@10.150.240.28 
@10.150.240.28 [worker.hpp:98] Server: 23 use nic id: 1
@10.150.240.28 
@10.150.240.28 [rdma_ctrl.hpp:110] rdma ctrl started!

I observed that it has "huge page alloc failed!" initially. I am not sure whether this has influence on the client connection.

baotonglu commented 2 years ago

Update:

If I set the number of thread (i.e., --threads) in the server side as 1 or 2, the YCSBC workload could successfully run. But when the #threads is higher than 2, the above bug will occur.

Currently I don't know why the #threads in server could influence the behavior of fetching model.

wxdwfc commented 2 years ago

Oops, sorry for the confusion. This is just a convince for implementation in the legacy code.

baotonglu commented 2 years ago

Sorry, I don't get it. What is "convince"? So --threads in the server side must be set as 1 or 2?

BTW, when I set the --threads of the client side higher than 1, the code also cannot run even if the #threads in server side is set as 1.

wxdwfc commented 2 years ago

Convince means I dedicated one thread at the server for QP connection/model handling (it's a long time ago and I cannot remember all the details). I recommend using the configurations stored in the ae_scripts folders or just just use the new code.

baotonglu commented 2 years ago

I try to run the ycsbc.toml. But the client always stuck at line 178 of benchs/ycsb/clients.hpp.

The following is my modified ycsbc.toml.

## global configs will overwrite the per-process configurations
global_configs = "--need_hash=false --cache_sz_m=327680 --server_host=10.150.240.28 --total_accts=100000000 --eval_type=sc --workloads=ycsbc --concurrency=8 --undefok=concurrency,workloads,eval_type,total_accts,server_host,cache_sz_m,need_hash"

## server process
[[pass]]
host = "10.150.240.28"
path = "/home/v-baotonglu/xstore"
cmd = "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/v-baotonglu/xstore/deps/boost/lib:/opt/intel/oneapi/compiler/2022.0.2/linux/compiler/lib/ia32_lin:/opt/intel/oneapi/compiler/2022.0.2/linux/compiler/lib/intel64_lin; sleep 1; ./fserver -db_type ycsb --threads 1 --id 0 -ycsb_num=100000000 --no_train=false --step=2 --model_config=ae_scripts/ycsb-model.toml"

## master process to collect results
[[pass]]
host = "10.150.240.30"
path = "/home/v-baotonglu/xstore"
cmd = "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/v-baotonglu/xstore/deps/boost/lib:/opt/intel/oneapi/compiler/2022.0.2/linux/compiler/lib/ia32_lin:/opt/intel/oneapi/compiler/2022.0.2/linux/compiler/lib/intel64_lin; sleep 5; ./master -client_config cs.toml -epoch 400 -nclients 1"

## below are clients
[[pass]]
host="10.150.240.33"
path="/home/v-baotonglu/xstore"
cmd = "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/v-baotonglu/xstore/deps/boost/lib:/opt/intel/oneapi/compiler/2022.0.2/linux/compiler/lib/ia32_lin:/opt/intel/oneapi/compiler/2022.0.2/linux/compiler/lib/intel64_lin; ./ycsb --threads 1 -concurrency 1 -workloads ycsbb -server_host 10.150.240.28 --id 1 -total_accts=20000000 -need_hash -eval_type=rpc"

The following is the modified cs.toml.

[general_config]
port   = 8888

[[client]]
id   = 1
host = "10.150.240.33"

[[client]]
id   = 6
host = "10.150.240.28" # we use val06 as the server

[master]
host = "10.150.240.30"

## server config

[[server]]
host    = "10.150.240.28"

[server_config]
db_type = "ycsbh"

Actually, in "cs.toml", I don't know which item specify the server, client id 6 or [[server]]?

Could you take a look whether my above scripts are wrong?

wxdwfc commented 2 years ago

Can you just run the scripts in AE with just host name changed and restrict the number of clients to 1?

baotonglu commented 2 years ago

May I know how to restrict the number of clients to 1? delete line 22 to 58 in ycsbc,toml?

wxdwfc commented 2 years ago

You can do this by ensure there is one cmd = "./ycsb --threads 24 -concurrency 1 -workloads ycsbb -server_host val00 --id 1 -total_accts=20000000 -need_hash -eval_type=rpc" in the toml (don't forget to adjust the toml).

I admit that a detailed documentation is missing. However, I think the layout of the toml is quite straightforward:

ps: The threads of the client and server must match and be >=2 as you have discovered.

baotonglu commented 2 years ago

As u suggested, I use the original script. Note that I have two NIC devices: one is ROCE and another is IB. I change the "choose_nic" to make sure threads always choose 1 (i.e., IB).

When the #threads is 24. The client side in ycsbc.toml will incur the following error and abort:

@10.150.240.28 [ud_msg.cc:155] Assertion! error wc status 4

When the #threads is < 24, client will still stuck at line 178 of benchs/ycsb/clients.hpp.

wxdwfc commented 2 years ago

Can you try not using the bootstrap for running?

Specifically, at the server, run:

 ./fserver -db_type ycsb --threads 24 --id 0 -ycsb_num=100000000 --no_train=false --step=2 --model_config=ae_scripts/ycsb-model.toml

Then at the client machine, run:

./ycsb --threads 24 -concurrency 1 -workloads ycsbc -server_host val02 --id 1 -total_accts=100000000 -need_hash -eval_type=sc --use_master=false -running_time=150 --need_hash=false

Notice that at the client, val02 should replace to the IP address of the server machine.

baotonglu commented 2 years ago

It only works when the #threads in server and client are set as 1.

I am wondering whether the main branch is more suitable for benchmarking. But I don't see any instructions how to do the benchmark in the main branch.

wxdwfc commented 2 years ago

It is weird, I've checked that running ./fserver -db_type ycsb --threads 24 --id 0 -ycsb_num=100000000 --no_train=false --step=2 --model_config=ae_scripts/ycsb-model.toml and ./ycsb --threads 24 -concurrency 1 -workloads ycsbc -server_host val02 --id 1 -total_accts=100000000 -need_hash -eval_type=sc --use_master=false -running_time=150 --need_hash=false works fine in our cluster (and by no means it cannot work on the others). Have you wait for a while? The server needs at least 5 seconds to train the model, so the initial results of client will be 0.

baotonglu commented 2 years ago

I always wait for enough time.

But some client threads always stuck at ret = s.pause_and_yield(h); inside function RPC::start_handshake ./deps/r2/src/rpc/rpc.cc when they conduct the second handshake at client.hpp.

And what does the parameter "concurrency" mean?

wxdwfc commented 2 years ago

We have not met such a scenario before. If the first handshake passes, then it means that the client can correctly contact with the server.

The concurrency means how many coroutines spawned at each threads to issue KV requests. Typically, a value between 4-12 will achieve the optimal performance.

baotonglu commented 2 years ago

Thanks. Now I am trying to understand the code.

What's the usage of line 173 and line 175 of client.hpp?

And why do you rely on RScheduler to run the body function starting from 175? Why not directly run it?

Thanks.

wxdwfc commented 2 years ago

In short, the code is used for spawning coroutines to execute requests, which can better leverage RDMA because it hides the latency of different roundtrips. For more about coroutine, you can google it, which is a standard technique for improving I/O intensive application performance.

wxdwfc commented 2 years ago

Hi, let me check one more time: can you run the following setups by:

  1. first, create an empty empty tomb
  2. copy the following:
global_configs="--server_host=val02 --undefok=server_host"

## server
[[pass]]
host = "val02"
path = "/cock/fstore"
cmd = "./fserver -db_type ycsb --threads 2 --id 0 -ycsb_num=1000 --no_train=false --step=2 --model_config=ae_scripts/ycsb-model.toml"

## a single client
[[pass]]
host = "val01"
path = "/cock/fstore"
cmd = "sleep 1;./ycsb --threads 2 -concurrency 1 -workloads ycsbc -server_host val02 --id 1 -total_accts=1000 -need_hash -eval_type=sc --use_master=false -running_time=150 --need_hash=false"
  1. Change the server_host in the global_config and pass's host to your machines's host.

Now, can the following toml work using the bootstrap script? Note that no other thing need to change.

Important

If the setup still don't work, please be particular check the code line 128: auto nic_id = VALNic::choose_nic(thread_id); in the client.hpp, make sure that each thread will choose the IB NIC on your machine by printing the device names and do the check.


ps:

  1. I check the setup and find the legacy code can run using one thread. Sorry for the confusion.
wxdwfc commented 2 years ago

Hi, one thing I forget to mention: beside change the choose_nic at the client-side, it is also important to adjust the choose_nic at the server-side accordingly, e.g., line 93 in ./server/worker.hpp`. Hope these information can help.

baotonglu commented 2 years ago

I tried your above suggestions but failed to make it work. Anyway, really thanks for your help. I am still trying to understand the code.

For the "connect" and "start_handshake" function in "clints.hpp", currently I think "connect" is used to get the QP address of the remote server, and build the address handle. However, I cannot understand "start_handshake" because it is too complex.

So I have two main questions: (1) What's the effect of "start_handshake" and "connect", individually? (2) Inside "start_handshake", what's the effect of s.pause_and_yield(h)?

yzim commented 2 years ago

I also encountered the same handshake issue. Have you solved it?

baotonglu commented 2 years ago

I do not solve it.