Open vicenteg opened 4 years ago
Merging #246 into dev will not change coverage. The diff coverage is
0.00%
.
@@ Coverage Diff @@
## dev #246 +/- ##
=========================================
Coverage 83.01% 83.01%
Complexity 203 203
=========================================
Files 34 34
Lines 1101 1101
Branches 66 66
=========================================
Hits 914 914
Misses 178 178
Partials 9 9
Impacted Files | Coverage Δ | Complexity Δ | |
---|---|---|---|
...m/google/cloud/pontem/config/WorkloadSettings.java | 44.23% <0.00%> (ø) |
10.00 <0.00> (ø) |
Query files by design expect to contain 1 query per line as multiple queries (forming a workload) will usually be included in one file, the proposed change defeats the purpose of the query file.
If you'd like to implement something like this you'd need to have a parsing scheme that identifies whether a line represents a complete query or an unfinished query and adjust parsing accordingly.
Thanks for looking at this.
Query files by design expect to contain 1 query per line as multiple queries (forming a workload) will usually be included in one file, the proposed change defeats the purpose of the query file.
Here's an example of how I've been configuring workloads:
concurrencyLevel: 5
isRatioBasedBenchmark: true
benchmarkRatios: [0.2, 0.5, 1.0, 2.0]
outputFileFolder: ./
workloads:
- name: my_workload
projectId: my_project
queryFiles:
- queries/query1.sql
- queries/query2.sql
- queries/query3.sql
Is this not equivalent to having a single query file with multiple queries?
One query per line makes complex queries and scripts much harder to read, and it requires pre-processing of the query to remove newlines. While this is not necessarily hard to do, it does represent friction.
If you'd like to implement something like this you'd need to have a parsing scheme that identifies whether a line represents a complete query or an unfinished query and adjust parsing accordingly.
Parsing SQL is non-trivial and BigQuery already parses SQL. I'd suggest that a query file's contents should be passed to BigQuery unmodified, so that BigQuery can do the parsing. This is especially important since people may want to test scripts, which are very naturally multiple line files and should be interpreted by BigQuery and not the tool submitting the script.
Diverging from the precedent set by bq
is a bit unexpected as well. The bq
tool does not parse the query but instead lets BigQuery handle that. For example, a query file like the one below works with bq
but would fail with the current implementation of the workload tester:
$ cat <<EOF > /tmp/simplequeries.sql
> SELECT *
> FROM (SELECT 1);
> SELECT 2;
> EOF
$ bq query < /tmp/simplequeries.sql
Waiting on bqjob_r7d03abef82f68ffd_0000017233128fb5_1 ... (1s) Current status: DONE
SELECT *
FROM (SELECT 1); -- at [1:1]
+-----+
| f0_ |
+-----+
| 1 |
+-----+
SELECT 2; -- at [3:1]
+-----+
| f0_ |
+-----+
| 2 |
+-----+
Thanks for the consideration!
--vince
Fixes #244
Read all lines in the query file, and join them to create a single query or script.