Open richamishra006 opened 1 year ago
The develop-3.0
branch of basho_bench is the main one used for pre-release testing of Riak - https://github.com/basho/basho_bench/tree/develop-3.0 with this configuration.
There is a alternative branch which is used for testing the same thing via the HTTP API.
Thanks for your reply, I installed rebar3, but when running make all command, it is giving error
root@68d36c17d1d9:/basho_bench# make all
/basho_bench/rebar get-deps
make: /basho_bench/rebar: Command not found
make: *** [Makefile:22: deps] Error 127
Please let me know if I am missing anything
ok i resolved that error by updating Makefile and replacing rebar with rebar3
i have installed erlang 25 and rebar3 but getting this error when running make all
root@68d36c17d1d9:/basho_bench# make all
/basho_bench/rebar3 get-deps
=ERROR REPORT==== 21-Jun-2023::15:30:10.069489 ===
beam/beam_load.c(551): Error loading function rebar3:run_aux/2: op put_tuple u x:
please re-compile this module with an Erlang/OTP 25 compiler
escript: exception error: undefined function rebar3:main/1
in function escript:run/2 (escript.erl, line 750)
in call from escript:start/1 (escript.erl, line 277)
in call from init:start_em/1
in call from init:do_boot/3
make: *** [Makefile:22: deps] Error 127
Try with OTP 22.
Run ./rebar3 escriptize
from the basho_bench folder.
That should generate a basho_bench executable in _build/default/bin which can then be used like this nohup _build/default/bin/basho_bench examples/riakc_nhs_general.config &
(or whatever config file), which will start generating output in tests/current/
. You will need to update the config file to include your ip addresses (not the use of commas not periods in the addresses).
Sorry, the readme instructions are lagging behind the changes made. There's a bit of fiddling normally required to get this working.
The generation of charts using R is probably broken, you may need to do your own work on the csv outputs to chart any results.
Thankyou for the response, it is working with otp 22,now probably one last doubt, I have three riak nodes riak0.local, riak1.local and riak2.local in what all files I need to update this name or I can update with IP address as well, but there are lots of file in examples directory, basically the end goal is to generate load, and how about if I want to increase the load, is there any parameter which I need to update for increasing the load on cluster
hey @martinsumner could you please help me out, I am just about to finish my load test
For basho_bench you have a configuration file which you can set-up to control your test. You can pick one of the examples as a starting point, a good starting point might be the riakc_nhs_general.config which is what is used for riak release testing.
Here is an annotated version of that config file to explain what it means:
{mode, max}.
{duration, 1920}.
{report_interval, 10}.
{node_name, testnode1}.
{concurrent, 100}.
The first few elements define the throughput for the test.
{mode, max}
- hit the cluster as hard as you can, each worker will try a new piece of work once the last has been completed.{concurrent, 100}
means have a 100 workers each generating and sending requests concurrently.{duration, 1920}
just sets the test to run for 1920 minutes.So in this case you would increase the throughput by increasing the {concurrent, 100}
value.
{driver, basho_bench_driver_nhs}.
{record_bucket, "recordBucket"}.
{document_bucket, "documentBucket"}.
{record_sync, "one"}.
{document_sync, "backend"}.
{node_confirms, 2}.
{postcode_indexcount, 6}.
%% Ignored by alwaysget and unique operations
{key_generator, {eightytwenty_int, 100000000}}.
{value_generator, {semi_compressible, 10000, 2000, 10, 0.1}}.
%% For alwaysget operations what is:
%% - the maximum number of keys per worker (max number of keys = this * concurrent)
%% - whether the inserts should be in key_order
{alwaysget, {2000000, 700000, skew_order}}.
{unique, {6000, key_order}}.
basho_bench_driver_nhs
.{pb_ips, [{127,0,0,1}]}.
{http_ips, [{127,0,0,1}]}.
[{192, 168, 3, 1}, {192, 168, 3, 2}]
. Normally the pb IPs and http IPs will be the same list.{operations, [{alwaysget_pb, 620}, {alwaysget_updatewith2i, 130},
{put_unique, 90}, {get_unique, 130}, {delete_unique, 25},
{postcodequery_http, 2}, {dobquery_http, 3}]}.
alwayget_pb
(this is an operation that fetches an object that has been added as part of the test - so never gets a not found - using the PB API). {postcodequery_http, 2}
- means that 0.2% of test requests will be for a HTTP 2i query of the postcode index.This isn't easy. There's a lot going on in this particular config file to generate various test scenarios. this particular test scenario runs in an upload mode until a certain threshold is reached, and then switches to a load which has more GETs than PUTs once the database is of sufficient size to be worth testing.
There are much simpler test configs available - https://github.com/basho/basho_bench/blob/mas-nhs-httponly/examples/riakc_pb.config is a good example. The simpler test scenarios tend to give unrealistic tests - e.g. most of the test runs with against a small database, with lots of not_found responses, and there's no testing of 2i etc.
econnrefused normally means either there is no lustener on the TCP port, or some sort of firewall is blocking it. On the riak node 172.22.0.212 if you do netstat -an | grep LISTEN | grep 8087
is it listening on that TCP port? Can you telnet to that port/IP from the basho_bench server?
I'm not sure why riakc_pb.config would work though. this has an info message just before it connects - may be worth confirming the details being reported in the console log when it hits this log:
https://github.com/basho/basho_bench/blob/develop-3.0/src/basho_bench_driver_riakc_pb.erl#L130
yes i got to know that riak node is getting down again and again, when checked the logs, I found this error
Supervisor riak_core_sup had child riak_core_vnode_manager started with riak_core_vnode_manager:start_link() at <0.298.0> exit with reason {{function_clause,[{riak_kv_vnode,terminate,[{bad_return_value,{stop,{{badmatch,{error,{{badmatch,{error,{{badmatch,{error,emfile}},[{leveled_pmanifest,open_manifest,1,[{file,"/root/riak/rel/pkg/out/riak-3.0.10-OTP22.3/_build/default/lib/leveled/src/leveled_pmanifest.erl"},{line,128}]},{leveled_penciller,start_from_file,1,[{file,"/root/riak/rel/pkg/out/riak-3.0.10-OTP22.3/_build/default/lib/leveled/src/leveled_penciller.erl"},{line,1231}]},{gen_server,init_it,2,[{file,"gen_server.erl"},...]},...]}}},...}}},...}}},...],...},...]},...} in context child_terminated
I increased the limit and fs.file-max but still getting this error
This looks like a standard ulimit issue. Guidance for setting ulimit here
Thankyou @martinsumner , i updated the limits by following the doc you shared, but still the service is dropping again and again. So I removed the ring and decided to test with a single node itself(where I am not facing issues), I used the riakc_nhs_general.config
file and ran the test.
Also I installed R using this command sudo apt-get install r-base
as mentioned in this doc https://docs.riak.com/riak/kv/latest/using/performance/benchmarking/index.html but when I am running priv/summary.r -i tests/current
the summary.png is not getting created. and I am getting this as the output
root@application-node01:~/basho_bench# priv/summary.r -i tests/current
[1] "plyr"
Loading required package: plyr
[1] "grid"
Loading required package: grid
[1] "getopt"
Loading required package: getopt
[1] "proto"
Loading required package: proto
[1] "ggplot2"
Loading required package: ggplot2
[1] 0
[1] -Inf
Warning message:
In max(summary$elapsed) : no non-missing arguments to max; returning -Inf
Error: No latency information available to analyze in tests/current
Execution halted
can you please help me in generating the graph for this test
I can't help I'm afraid. Personally, I load results into a spreadsheet and then manipulate and chart them there. This made it a easier (for me) to chart comparisons between different runs of basho_bench etc and tidy up the presentation.
I'm not sufficiently familiar with R/ggplot to troubleshoot this code.
can you please explain me the way you are doing it, I mean which file you are uploading to spreadheet, it will be really helpful for me
Hi @martinsumner , actually i am new to basho_bench so my questions might sound silly to you. I am thankful for your help so far. I want to test the risk performance running on a node with 256GB memory and 72 cores CPU, i want to generate load and see if it breaks,
i am using this file riakc_nhs_general.config
, so apart from increasing concurrent value, is there anything which i can fine-tune to increase the load, as i tried running the tests with 300 as concurrent value, but it didn't made any change on the utilisation
At some stage you may hit limits on the basho_bench node itself, or with the network connection. The riakc_nhs_general.config
uses fairly large objects so you can hit bandwidth limits quite easily. On the Riak side, the first limit tends to be write throughput to disk (as the test initially sends a write-heavy load to build up the database).
When testing Riak, the crucial questions are:
There are then tuning options in basho_bench to reflect this. the answers to these questions will have a huge impact on the throughput (in terms of transactions per second) that can be achieved.
Then you need to setup Riak to accept the load. How many nodes do you expect to have in your Riak cluster - normally Riak in production will run on at least six nodes, so it doesn't make sense to test for any less than that. Then there are things to consider such as what anti-entropy configuration to use, and which Riak backend you intend to use (this makes a big difference for performance). The ring_size also needs to be set correctly to reflect the size of the cluster (you generally need to make sure that ring_size > total count of vcpu in the cluster).
There's not much point running the test without a certain investment in observability. So generally you need to make sure all the riak logs, the riak metrics and your general OS metrics are being indexed in something splunk-like so that you can then determine where limits are and tune accordingly.
Doing worthwhile database testing needs quite a bit of preparation.
Thanks, for the detailed explanation how can i install riak exporter to collect some metrics I have installed grafana and prometheus, also I observed that the files in tests/current dir doesn't contain any data, attaching the screenshots, when doing cat for these files,not getting any data.
Then the test must not be starting. Is there anything in the crash.log?
This is the crash.log output . Also the tests is failing when using riakc_nhs_general.config
file but when using riakc_pb.config
I am able to see data in latencies.csv file
2023-07-03 05:50:29 =ERROR REPORT====
** Generic server <0.737.0> terminating
** Last message in was {tcp_closed,#Port<0.299>}
** When Server state == {state,{172,22,0,214},8087,false,false,undefined,false,gen_tcp,undefined,{[],[]},1,[],infinity,undefined,undefined,undefined,undefined,[],100,false,{false,0}}
** Reason for termination ==
** disconnected
2023-07-03 05:50:29 =CRASH REPORT====
crasher:
initial call: riakc_pb_socket:init/1
pid: <0.737.0>
registered_name: []
exception exit: {disconnected,[{gen_server,handle_common_reply,8,[{file,"gen_server.erl"},{line,751}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}
ancestors: [<0.736.0>]
message_queue_len: 0
messages: []
links: [<0.736.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 610
stack_size: 27
reductions: 1130
neighbours:
Assuming 172.22.0.214:8087 is the correct IP/port for a Riak instance, there's no real clues here as to why it is being disconnected, especiall given things work fine with the riakc_pb.config
, which also uses the PB Riak erlang client in the the same way.
you may be able to run the erlang client from rebar3 shell
on your basho_bench box, and see if this does or doesn't work:
{ok, Pid} = riakc_pb_socket:start("172.22.0.214", 8087).
MyBucket = <<"test">>.
Val1 = 1.
Obj1 = riakc_obj:new(MyBucket, <<"one">>, Val1).
riakc_pb_socket:put(Pid, Obj1).
{ok, Fetched1} = riakc_pb_socket:get(Pid, MyBucket, <<"one">>).
i got this as output, seems something is wrong with the drivers
Also i am getting this error when i did cat nohup.out
12:57:21.584 [debug] Driver basho_bench_driver_riakc_pb crashed: {undef,[{base64,encode,[<<1,63,146,202>>],[]},{basho_bench_keygen,'-new/2-fun-11-',2,[{file,"/root/basho_bench/src/basho_bench_keygen.erl"},{line,82}]},{basho_bench_driver_riakc_pb,run,4,[{file,"/root/basho_bench/src/basho_bench_driver_riakc_pb.erl"},{line,319}]},{basho_bench_worker,worker_next_op2,2,[{file,"/root/basho_bench/src/basho_bench_worker.erl"},{line,252}]},{basho_bench_worker,worker_next_op,1,[{file,"/root/basho_bench/src/basho_bench_worker.erl"},{line,258}]},{basho_bench_worker,max_worker_run_loop,1,[{file,"/root/basho_bench/src/basho_bench_worker.erl"},{line,338}]}]}
i gogoled and then ran this command, attaching the screenshot
Hi, can someone please suggest me some benchmarking tool for Riak to generate and perform load test on the machine.