Closed jlouis closed 12 years ago
I used LevelDB for collecting the event data. I've written a script that converts the binary Erlang terms into CSV rows. The resulting data is around 3 gigs. Plotting in R was the sole reason why I wrote it. I'll get you a link to the raw data today or tomorrow. It takes a minute or two to generate and there are 10 result sets.
I'm generating it now. I'm going to lunch with the family. I'll put it up on Dropbox sometime today.
Awesome! I like the work done here.
I'm uploading it now to dropbox. The CSV file compressed really well. It's only a ~500meg bz2.
I am rerunning the node ws test with the new clustering. I'm going to add that to the bz2 with the old one. Please hold.
I'm not including that new benchmark, the new instance's performance characteristics are different. These is the description of the fields.
timestamp, "erlang-cowboy", client_id, event_id, event
facepalm, The second column should be "type". You'll have to fix that. You probably don't want to wait for me to regen the data and it's pretty trivial to rename a column name in R.
Anyhow, these are the definitions:
For the "'EXIT'" event, I used the crash reason as the event_id. It is still unique for each crash event, it simply serves a dual purpose for crash events. That will help you differentiate what kind of crash it is. I suppose I could of made the event type "{'EXIT', connection_timeout}" but what done is done.
I started writing an R script using precompiled CSV tables based on the event data. You can find that script in priv/ It's probably pretty terrible R code.
The compiled data, which is only 124M is here:
https://www.dropbox.com/s/e9ygu4z8oyr9hla/compiled-data-20120616.tar.bz2
I've precomputed the connection times and message latencies as well summed up all the counts. That may save you some time. It is also what I was using for the R script (which is probably useless to you)
Here's a link to the event dump, it's 515M,
https://www.dropbox.com/s/sk9ilysejmnk3dn/results-20120616.tar.bz2
It would be awesome if you sent a pull request with your R scripts when you're done.
Also, throw out the ruby data, the server crashed quickly with a seg fault but I didn't want to debug the cause while I was in the middle of running the benchmark. I would feel right if you included it with the rest as it was probably a bad library rather than a performance issue.
Have fun.
Eric.
Also, each table is broken up by server type but because the type is in each row, you can easily cat the files together if you pop off the header. I figured it was easier to cat a file rather than split up one giant file.
Awesome work! I'll grab it during the day and then have a look at it later when I have some time. I'll probably fork the repo here for the R graphs and stuff.
Thanks, I appreciate it.
Eric, I see you have a branch which is all about producing raw data metrics from the system. I would really like to get hold of those data, but I'll wait until you have settled on a format. The reason is that I want to plot stuff in R first rather than doing statistics on it. But I see you are changing the format so I can't really get to work until you have decided upon the format in question :)
And out of curiosity, what are you using LevelDB for here?