GoogleCloudPlatform / PerfKitBenchmarker

PerfKit Benchmarker (PKB) contains a set of benchmarks to measure and compare cloud offerings. The benchmarks use default settings to reflect what most users will see. PerfKit Benchmarker is licensed under the Apache 2 license terms. Please make sure to read, understand and agree to the terms of the LICENSE and CONTRIBUTING files before proceeding.
https://googlecloudplatform.github.io/PerfKitBenchmarker/
Apache License 2.0
1.91k stars 516 forks source link

Cassandra_YCSB 0.9.0 is failing #1825

Open adnavare opened 5 years ago

adnavare commented 5 years ago

I am running cassandra against YCSB, and I have two different machines, one acting as the server and running Ubuntu 18.10, while client where YCSB will be installed is running Ubuntu 16.04.

I run default version of YCSB_cassandra i.e. 0.9.0, with static config file. Here is the command $ ./pkb.py --benchmarks=cassandra_ycsb --benchmark_config_file=/home/anupn/baremetal-static.yaml --num_vms=1 --cassandra_replication_factor=1 --ycsb_client_vms=1

It fails on downloading the YCSB 0.9.0.tar, but if I try downloading it outside the process it gets downloaded quickly. I am copying my .yaml file as well as log file

pkb.log baremetal-static.txt

I waited for more than 15 minutes, and then killed it. With the same setup if I run MongoDB with YCSB it is able to run the test properly. I looked at the code, and it looks the way server and clients are prepared in MongoDB is exactly same as in Cassandra. Pointers would be really appreciated.

s-deitz commented 5 years ago

I haven't used static machines recently so I don't have much advice there, but I took a look at the log and think it may be blocking on a different line.

I see the log line of: 2018-12-05 14:44:54,182 d472e2a2 Thread-30 cassandra_ycsb(1/1) vm_util.py:348 DEBUG Ran: {ssh -A -p 22 ... ... mkdir -p /opt/pkb/ycsb && curl -L https://github.com/brianfrankcooper/YCSB/releases/download/0.9.0/ycsb-0.9.0.tar.gz | tar -C /opt/pkb/ycsb --strip-components=1 -xzf -} ReturnCode:0

This log line means that this command 'ran', so I suspect the download completed. Then, about 14 minutes later, there is a SIGINT. There is no intermediate logging.

Earlier, on the other machine, there is a log line for running an SSH command that it looks like never completes:

2018-12-05 14:43:03,525 d472e2a2 Thread-31 cassandra_ycsb(1/1) vm_util.py:297 INFO Running: ssh -A -p 22 ... ... mkdir -p /opt/pkb && cd /opt/pkb && wget archive.apache.org/dist/ant/binaries/apache-ant-1.9.6-bin.tar.gz && tar -zxf apache-ant-1.9.6-bin.tar.gz && ln -s /opt/pkb/apache-ant-1.9.6/ /opt/pkb/ant

What happens if you run that locally on the machine or via SSH as is done here? Does that complete?

Note that the benchmark is using multiple threads so it can setup both machines in parallel.

adnavare commented 5 years ago

@s-deitz : Thanks for taking a look at it. Yes I tried to run locally with the same commands from my client machine and it succeeds, not sure why within the process it is not happening. Also both these packages - ycsb0.9.0 and apache-ant does succeed on the same client-server setup if I run mongodb with YCSB.

s-deitz commented 5 years ago

Are you able to run the whole ssh command from the machine you launched PKB on as well? I think there are three things to try:

  1. Run pkb and see if the same command fails. (This you mentioned fails.)
  2. Run the mkdir && cd && wget && tar && ln -s command on the client machine. (This you mentioned works.)
  3. Run the whole ssh command on the machine you ran pkb on to see if there is a problem within the context of the ssh. (Did you try this?)

If 2 and 3 succeed, but 1 fails, then there must be some difference between how the command runs on the client when we ssh via pkb versus when you ssh outside pkb.

It might be useful to break up the command into separate RemoteCommand invocations to see if the wget or the tar are failing. Then you might also print out any environment variables in the client machine to see if there is a difference between your ssh and pkb ssh.

adnavare commented 5 years ago

@s-deitz: So I tried these things

  1. I divided the install_cmd here https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/blob/1f8676868931f71ce0c8d45868f0dd54587b3df4/perfkitbenchmarker/linux_packages/ycsb.py#L213 into two parts - one is just 'mkdir -p {0}', the other is ''curl -vL {1} | tar -C {0} --strip-components=1 -xvzf -'
    1. The directory gets created. The second part is still not happening and it is not throwing anything on the terminal output.
    2. Individually if I run, I can run mkdir, curl, tar commands successfully. Even with SSH these commands are getting executed properly, as a whole + separate (mkdir, curl, tar)
    3. I tried to redirect output to /dev/stdout of the curl command, i.e. curl -vL {https://github.com/brianfrankcooper/YCSB/releases/........} &> /dev/stdout, and got an error as below -vL https://github.com/brianfrankcooper/YCSB/releases/download/0.9.0/ycsb-0.9.0.tar.gz &> /dev/stdout | tar -C /opt/pkb/ycsb -xvzf - &> /dev/stdout STDOUT: gzip: stdin: not in gzip format
      Not exactly sure if this is something because of the way i am redirecting the output of the tar.gz to stdout or is the problem because of the gz format
adnavare commented 5 years ago

any clue what might be going wrong?

flint-dominic commented 5 years ago

Is Cassandra supported in 0.9.0? https://github.com/brianfrankcooper/YCSB/issues/766

s-deitz commented 5 years ago

I ran PKB on an Ubuntu 16.04 and an Ubuntu 18.04 instance launched by PKB and they both worked.

The flags I used were:

--benchmarks=cassandra_ycsb --num_vms=1 --cassandra_replication_factor=1 --ycsb_client_vms=1 --os_type=ubuntu1604 --machine_type=n1-standard-8 --data_disk_type=pd-ssd --gce_num_local_ssds=0

Changing ubuntu1604 to ubuntu1804 still worked.

The log line of the successful remote command that was reported as failing is:

2018-12-11 13:33:39,984 7aef774d Thread-53 cassandra_ycsb(1/1) vm_util.py:348 DEBUG Ran: {ssh -A -p 22 perfkit@107.178.211.84 -2 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o IdentitiesOnly=yes -o PreferredAuthentications=publickey -o PasswordAuthentication=no -o ConnectTimeout=5 -o GSSAPIAuthentication=no -o ServerAliveInterval=30 -o ServerAliveCountMax=10 -i /tmp/perfkitbenchmarker/runs/7aef774d/perfkitbenchmarker_keyfile mkdir -p /opt/pkb/ycsb && curl -L https://github.com/brianfrankcooper/YCSB/releases/download/0.9.0/ycsb-0.9.0.tar.gz | tar -C /opt/pkb/ycsb --strip-components=1 -xzf -} ReturnCode:0, WallTime:0:12.56s, CPU:0.02s, MaxMemory:5436kb STDOUT: STDERR: Warning: Permanently added '107.178.211.84' (ECDSA) to the list of known hosts. % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed ^M 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0^M100 605 0 605 0 0 3517 0 --:--:-- --:--:-- --:--:-- 3517 ^M 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0^M 6 326M 6 20.7M 0 0 15.2M 0 0:00:21 0:00:01 0:00:20 21.1M^M 15 326M 15 49.3M 0 0 20.8M 0 0:00:15 0:00:02 0:00:13 24.7M^M 24 326M 24 79.6M 0 0 23.7M 0 0:00:13 0:00:03 0:00:10 26.7M^M 34 326M 34 112M 0 0 25.7M 0 0:00:12 0:00:04 0:00:08 28.1M^M 44 326M 44 145M 0 0 27.2M 0 0:00:11 0:00:05 0:00:06 29.2M^M 55 326M 55 179M 0 0 28.3M 0 0:00:11 0:00:06 0:00:05 31.8M^M 65 326M 65 214M 0 0 29.2M 0 0:00:11 0:00:07 0:00:04 33.2M^M 76 326M 76 249M 0 0 29.8M 0 0:00:10 0:00:08 0:00:02 33.8M^M 86 326M 86 283M 0 0 30.3M 0 0:00:10 0:00:09 0:00:01 34.3M^M 97 326M 97 318M 0 0 30.7M 0 0:00:10 0:00:10 --:--:-- 34.5M^M100 326M 100 326M 0 0 30.8M 0 0:00:10 0:00:10 --:--:-- 34.5M

It looks like you are running on VMs in GCP. Does it work if you have PKB provision the VMs instead? If so, this may be a good way to debug the issue.

One other question: Are you running from master or at the last release? I tried from master.

adnavare commented 5 years ago

@flint-dominic : I tried with 0.11.0, but it fails with different error while downloading Apache-ant. (http://archive.apache.org/dist/ant/binaries/apache-ant-1.9.6-bin.tar.gz) and throwing "Connection Timeout". I tried running the command for the above download, like ssh -A -p 22 user@x.x.x.x -2 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o IdentitiesOnly=yes -o PreferredAuthentications=publickey -o PasswordAuthentication=no -o ConnectTimeout=5 -o GSSAPIAuthentication=no -o ServerAliveInterval=30 -o ServerAliveCountMax=10 -i /home/anupn/.ssh/id_rsa mkdir -p /opt/pkb && cd /opt/pkb && wget archive.apache.org/dist/ant/binaries/apache-ant-1.9.6-bin.tar.gz && tar -zxf apache-ant-1.9.6-bin.tar.gz && ln -s /opt/pkb/apache-ant-1.9.6/ /opt/pkb/ant And it works properly. Somehow from the PKB it is giving me connection timeout.

adnavare commented 5 years ago

@s-deitz : Why do i have to give machine_type, data_disk_type, gce_num_local_ssds when I am using static config file and static machine not VM? I am not running on the VMs in GCP

I tried from master.

cwilkes commented 5 years ago

@adnavare Try going to your instance and downloading the Cassandra tarball: $ curl -L -O https://github.com/brianfrankcooper/YCSB/releases/download/0.9.0/ycsb-0.9.0.tar.gz and seeing if the tarfile looks okay: $ tar -tvzf ycsb-0.9.0.tar.gz

If you don't supply the "-L" to curl it won't follow redirects and the downloaded file will be HTML staying "you are being redirected..."

Do you know if you have to go through an http proxy server to get out to the internet? That could also stop you from getting the file.

cwilkes commented 5 years ago

Let me know if you need any additional help with this.