allanbank / mongodb-async-driver

The MongoDB Asynchronous Java Driver.
Apache License 2.0
22 stars 15 forks source link

2014 YCSB Tests Don't Include Testing Script #17

Open ankushg opened 9 years ago

ankushg commented 9 years ago

The YCSB tests from 2014 don't include the run-tests script (and the script isn't linked to from the webpage either). Is the script the same as the 2009 script?

allanbank commented 9 years ago

The 2009 script should still work but we have made some changes since then. I posted the current version in this gist.

ankushg commented 9 years ago

Awesome, thanks! Is there a script you used to generate the graphs from the benchmark outputs too, or was that all manually done?

allanbank commented 9 years ago

There is a parser for the results here. It generates a csv file that you can then paste into the right page of the spreadsheet on the website.

allanbank commented 9 years ago

@ankushg - Do you have any other issues/questions? If not can I close this ticket?

ankushg commented 9 years ago

Some background on why I'm asking all these questions: I'm trying to run YCSB workloads against multiple MongoDB Replica Set and standalone server configurations, and present the results similarly to your comparison of mongo-legacy and mongo-async. The problem that I'm running into is that if I'm varying both the read preferences and write concerns among all their possible values, the number of different configurations number in the twenties as opposed to the two that you have (mongo-legacy vs mongo-async).

I guess the last questions that I had were:

  1. Can you think of a better way of structuring graphs for these many groups?
  2. Did you manually generated the graphs in the ODS file or did you script that too? I'd ideally like to avoid manually setting up all the graphs since there are so many of them...

I realize these fall waaaaay out of what most Github issue questions entail, so feel free to just close the ticket :)

allanbank commented 9 years ago

I created the graphs by hand, sorry. There are plotting tools (like gnuplot) that can be used to plot data once you get it into a reasonable format (which the output from YCSB is not). To be honest I had not thought of using those until now.

What I ended up doing was sort of worked both ends toward the center. I created some fake data and generated the plots that I wanted with the fake data. I then wrote the parser to extract the data and got it as close as I could to the fake data's structure as I could. I then refined the plot's structure based on the actual values.

I honestly played with the charts for a couple of weeks and finally decided they were good enough to present what was going on. I suspect a hard core data scientist might be able to do better but life is too short.

If I was you, I would work on getting the tests running and work on the parser program to generate the data sets in the format that you want them in for your favorite plotting program. Running all of those tests will take some time and you can work on the parser while they are running.

Be sure to let me know what you come up with. I might borrow it back.

Rob.