GoogleCloudPlatform / pontem

Open source tools for Google Cloud Storage and Databases.
Apache License 2.0
63 stars 15 forks source link

Implement feedback on BQWT #231

Open nagarkar opened 5 years ago

nagarkar commented 5 years ago

First of all, thanks for writing this tool. I tried it today as part of writing the performance section of the BigQuery book we are writing. Here's how I used it: https://github.com/GoogleCloudPlatform/bigquery-oreilly-book/blob/master/07_perf/time_bqwt.sh

A few suggestions:

(1) Please provide a way for the user to specify how many times to test. Typically, we want to average measurements and report 10th and 90th percentiles. (this is different from concurrency) (2) The samples on GitHub show non-standard SQL. The code, however, sets legacySQL to false. standard SQL is the right choice, but please update the genomics query in the sample config. This took a while to chase down because it turns out that I also had to escape the backslash in the project name. (3) The output dir setting in sample config was tricky to get right. It appears that different jobs get started in different directories, so specifying a relative directory (as in the sample) didn't work. I had to specify an absolute dir (again, the absolute dir is the right choice, but please update the sample) (4) The results JSON could be more helpful if you were to aggregate the JSON results and report arrays of wallTime, runTime, etc. for each concurrency level. This would make it easier to import the data into plotting libraries. (5) The query file had to be a single line of text. This is quite unfriendly since real queries tend to quite long. I used tr to retain readability, but it would be good if you treated each queryFile as a file instead of reading it line-by-line. (6) typo in the second word of this log: "Finished bechmarking phase"