GoogleCloudPlatform / pontem

Open source tools for Google Cloud Storage and Databases.
Apache License 2.0
63 stars 15 forks source link

WorkloadSettings.readQueries() assumes one query per line #244

Open vicenteg opened 4 years ago

vicenteg commented 4 years ago

WorkloadSettings.readQueries assumes one line per query in queryFiles, but this breaks when there's a multi-line query in a query file. An example query that causes the workload tester to fail, but should be valid in BigQuery:

SELECT 
  1, 2, 3

The error:

$ gradle clean run

> Task :BigQueryWorkloadTester:compileJava
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

> Task :BigQueryWorkloadTester:run
May 16, 2020 10:21:22 PM com.google.cloud.pontem.BigQueryWorkloadTester main
INFO: Welcome to BigQuery Workload Tester!
May 16, 2020 10:21:22 PM com.google.cloud.pontem.BigQueryWorkloadTester main
INFO: Loading config
May 16, 2020 10:21:22 PM com.google.cloud.pontem.BigQueryWorkloadTester main
INFO: Starting execution
May 16, 2020 10:21:22 PM com.google.auth.oauth2.DefaultCredentialsProvider warnAboutProblematicCredentials
WARNING: Your application has authenticated using end user credentials from Google Cloud SDK. We recommend that most server applications use service accounts instead. If your application continues to use end user credentials from Cloud SDK, you might receive a "quota exceeded" or "API not enabled" error. For more information about service accounts, see https://cloud.google.com/docs/authentication/.
May 16, 2020 10:21:22 PM com.google.cloud.pontem.benchmark.RatioBasedWorkloadBenchmark run
INFO: Executing Ratio Based Benchmark for Workload: Simple and Text Queries
May 16, 2020 10:21:22 PM com.google.cloud.pontem.benchmark.runners.ConcurrentWorkloadRunner run
INFO: Executing Workload with Concurrency Level: 1
May 16, 2020 10:21:23 PM com.google.cloud.pontem.benchmark.runners.SerialQueryRunner executeQuery
SEVERE: Caught exception while executing query: 
com.google.cloud.pontem.benchmark.backends.BackendException: com.google.cloud.bigquery.BigQueryException: Syntax error: Unexpected end of script at [1:7]
        at com.google.cloud.pontem.benchmark.backends.BigQueryBackend.executeQuery(BigQueryBackend.java:68)
        at com.google.cloud.pontem.benchmark.runners.SerialQueryRunner.executeQuery(SerialQueryRunner.java:65)
        at com.google.cloud.pontem.benchmark.runners.SerialQueryRunner.run(SerialQueryRunner.java:55)
        at com.google.cloud.pontem.benchmark.runners.callables.WorkloadRunnerCallable.call(WorkloadRunnerCallable.java:58)
        at com.google.cloud.pontem.benchmark.runners.callables.WorkloadRunnerCallable.call(WorkloadRunnerCallable.java:30)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: com.google.cloud.bigquery.BigQueryException: Syntax error: Unexpected end of script at [1:7]
        at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.translate(HttpBigQueryRpc.java:100)
        at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.getQueryResults(HttpBigQueryRpc.java:424)
        at com.google.cloud.bigquery.BigQueryImpl$23.call(BigQueryImpl.java:792)
        at com.google.cloud.bigquery.BigQueryImpl$23.call(BigQueryImpl.java:787)
        at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
        at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
        at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
        at com.google.cloud.bigquery.BigQueryImpl.getQueryResults(BigQueryImpl.java:786)
        at com.google.cloud.bigquery.BigQueryImpl.getQueryResults(BigQueryImpl.java:776)
        at com.google.cloud.bigquery.Job$1.call(Job.java:329)
        at com.google.cloud.bigquery.Job$1.call(Job.java:326)
        at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
        at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
        at com.google.cloud.RetryHelper.poll(RetryHelper.java:64)
        at com.google.cloud.bigquery.Job.waitForQueryResults(Job.java:325)
        at com.google.cloud.bigquery.Job.waitFor(Job.java:240)
        at com.google.cloud.pontem.benchmark.backends.BigQueryBackend.executeQuery(BigQueryBackend.java:66)
        ... 8 more
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
  "code" : 400,
  "errors" : [ {
    "domain" : "global",
    "location" : "q",
    "locationType" : "parameter",
    "message" : "Syntax error: Unexpected end of script at [1:7]",
    "reason" : "invalidQuery"
  } ],
  "message" : "Syntax error: Unexpected end of script at [1:7]",
  "status" : "INVALID_ARGUMENT"
}
        at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:150)
        at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
        at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:401)
        at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1097)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:499)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
        at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.getQueryResults(HttpBigQueryRpc.java:422)
        ... 23 more

May 16, 2020 10:21:24 PM com.google.cloud.pontem.benchmark.runners.SerialQueryRunner executeQuery
SEVERE: Caught exception while executing query: 
com.google.cloud.pontem.benchmark.backends.BackendException: com.google.cloud.bigquery.BigQueryException: Syntax error: Unexpected integer literal "1" at [1:3]
        at com.google.cloud.pontem.benchmark.backends.BigQueryBackend.executeQuery(BigQueryBackend.java:68)
        at com.google.cloud.pontem.benchmark.runners.SerialQueryRunner.executeQuery(SerialQueryRunner.java:65)
        at com.google.cloud.pontem.benchmark.runners.SerialQueryRunner.run(SerialQueryRunner.java:55)
        at com.google.cloud.pontem.benchmark.runners.callables.WorkloadRunnerCallable.call(WorkloadRunnerCallable.java:58)
        at com.google.cloud.pontem.benchmark.runners.callables.WorkloadRunnerCallable.call(WorkloadRunnerCallable.java:30)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: com.google.cloud.bigquery.BigQueryException: Syntax error: Unexpected integer literal "1" at [1:3]
        at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.translate(HttpBigQueryRpc.java:100)
        at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.getQueryResults(HttpBigQueryRpc.java:424)
        at com.google.cloud.bigquery.BigQueryImpl$23.call(BigQueryImpl.java:792)
        at com.google.cloud.bigquery.BigQueryImpl$23.call(BigQueryImpl.java:787)
        at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
        at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
        at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
        at com.google.cloud.bigquery.BigQueryImpl.getQueryResults(BigQueryImpl.java:786)
        at com.google.cloud.bigquery.BigQueryImpl.getQueryResults(BigQueryImpl.java:776)
        at com.google.cloud.bigquery.Job$1.call(Job.java:329)
        at com.google.cloud.bigquery.Job$1.call(Job.java:326)
        at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
        at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
        at com.google.cloud.RetryHelper.poll(RetryHelper.java:64)
        at com.google.cloud.bigquery.Job.waitForQueryResults(Job.java:325)
        at com.google.cloud.bigquery.Job.waitFor(Job.java:240)
        at com.google.cloud.pontem.benchmark.backends.BigQueryBackend.executeQuery(BigQueryBackend.java:66)
        ... 8 more

Each query file in the configuration should be treated as a single query or script and sent to BigQuery in its entirety, since it's common to have large, multi-line queries to test.

ldanielmadariaga commented 4 years ago

Query files by design expect to contain 1 query per line as multiple queries (forming a workload) will usually be included in one file, the proposed change defeats the purpose of the query file.

If you'd like to implement something like this you'd need to have a parsing scheme that identifies whether a line represents a complete query or an unfinished query and adjust parsing accordingly.