Open moandcompany opened 7 years ago
We've had the same problem. Except for us, it was with the the BigQuery API that we were bringing into our project. Removing it fixed it (Beam has a dependancy in it anyway).
We're also experiencing issues during file staging. Before the attempt to upload files is made, we receive this error:
WARNING: Request failed with code 409, performed 0 retries due to IOExceptions, performed 0 retries due to unsuccessful status codes, HTTP framework says request can be retried, (caller responsible for retrying): https://www.googleapis.com/storage/v1/b?predefinedAcl=projectPrivate&predefinedDefaultObjectAcl=projectPrivate&project=<project name omitted>
Accessing the HTTP resource specified will return JSON data, within which there is an error with message
Anonymous users does not have storage.buckets.list access to project <project number omitted>.
We had the same issue and we can confirm that as @moandcompany suggest, this fixes it:
compile ('com.google.api-client:google-api-client:1.22.0') {
force = true
}
For the record, our stack trace is pretty similar. We are running 2.2.0 snapshot version of apache beam:
java.io.IOException: Error executing batch GCS request
at org.apache.beam.sdk.util.GcsUtil.executeBatches(GcsUtil.java:603)
at org.apache.beam.sdk.util.GcsUtil.getObjects(GcsUtil.java:342)
at org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystem.matchNonGlobs(GcsFileSystem.java:217)
at org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystem.match(GcsFileSystem.java:86)
at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:125)
at org.apache.beam.sdk.io.FileSystems.matchSingleFileSpec(FileSystems.java:190)
at org.apache.beam.runners.dataflow.util.PackageUtil.alreadyStaged(PackageUtil.java:159)
at org.apache.beam.runners.dataflow.util.PackageUtil.stagePackageSynchronously(PackageUtil.java:188)
at org.apache.beam.runners.dataflow.util.PackageUtil.access$000(PackageUtil.java:69)
at org.apache.beam.runners.dataflow.util.PackageUtil$2.call(PackageUtil.java:176)
at org.apache.beam.runners.dataflow.util.PackageUtil$2.call(PackageUtil.java:173)
at org.apache.beam.runners.dataflow.repackaged.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
at org.apache.beam.runners.dataflow.repackaged.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
at org.apache.beam.runners.dataflow.repackaged.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.ExecutionException: com.google.api.client.http.HttpResponseException: 404 Not Found
Not Found
at org.apache.beam.sdks.java.extensions.google.cloud.platform.core.repackaged.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:500)
at org.apache.beam.sdks.java.extensions.google.cloud.platform.core.repackaged.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:459)
at org.apache.beam.sdks.java.extensions.google.cloud.platform.core.repackaged.com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:76)
at org.apache.beam.sdk.util.GcsUtil.executeBatches(GcsUtil.java:595)
... 16 more
I got similar problem. Here's the API response.
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"message" : "(249a6f2653c550b0): The workflow was automatically rejected by the service because it may trigger an identified bug in the SDK.\nBug details: com.google.api-client:google-api-client library version 1.23.0 is not supported..\nContact dataflow-feedback@google.com for further help. Please use this identifier in your communication: 67379331.",
"reason" : "badRequest"
} ],
"message" : "(249a6f2653c550b0): The workflow was automatically rejected by the service because it may trigger an identified bug in the SDK.\nBug details: com.google.api-client:google-api-client library version 1.23.0 is not supported..\nContact dataflow-feedback@google.com for further help. Please use this identifier in your communication: 67379331.",
"status" : "INVALID_ARGUMENT"
}
Google added support to reject jobs from being created with this issue to prevent users from starting malformed jobs.
The root cause for the 404's is outlined at https://github.com/google/google-api-java-client/issues/1073. Hilariously, you can't get to the error rejecting the job for bad dependencies until you've cleared up the staging problem (in our case by upgrading to com.google.apis:google-api-services-storage:v1-rev115-1.23.0 ). Is there another problem that's causing the job rejection? We're being forced to 1.23.0 by a bug in another Google API so this puts us between a rock and a hard place because lol @ Java versioning on Maven.
+1 happening to us too. Is there any suggested remedy?
The Cloud Dataflow team has added a page on Dataflow SDK and Worker Dependencies that identifies the google-api-client 1.22.0 version requirement (Java)
The Cloud Dataflow team has added a page on Dataflow SDK and Worker Dependencies that identifies the google-api-client 1.22.0 version requirement (Java)
That is a useful link, but not really a solution for those of us like @frew who need to use google-api-client 1.23.0
due to a bug in another library
I also have this issue
any updates? Im running into this issue
same here. apache beam 2.3.0 with dataflowrunner having the same 404 error. A permanent fix would be ideal.
Thanks.
We encountered this as well. We're on Scio 0.5.5-beta1 and attempted to force the version to 1.2.2
using Overrides never worked. However, explicitly adding this library with a force()
did work, i.e.,
"com.google.api-client" % "google-api-client" % "1.22.0" force()
I have the same problem. Google forces moving out of storage@v1. Add
The runtime error becomes
Exception in thread "main" org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.NoClassDefFoundError: com/google/api/gax/rpc/HeaderProvider
It looks libraries conflict across Google's infrastructure libraries. Horrible.
@dsquier omg thank you. I was battling dependencyOverrides for a while and didn't think about force.
I was redirected here from google because I was using the bigquery-client
library and the same error appeared. Does anybody found a workaround to this issue?
I've tried (without success)
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-bigquery</artifactId>
<version>0.21.0-beta</version>
</dependency>
After analyzing my dependencies and checking the error, I was able to fix this by forcing the version of google-api-services-dataflow to v1b3-rev221-1.22.0 (and of course setting google-api-client to version 1.22.0)
Only setting google-api-client to the old version wasn't enough for me since I had the following error thrown:
java.io.IOException: Error executing batch GCS request
at org.apache.beam.sdk.util.GcsUtil.executeBatches(GcsUt
when trying to compile my dataflow template
For anyone else still seeing issues like this, check out the version numbers here and make sure you aren't importing a conflicting dependency.
Now Beam 2.5.0 depends on google-api-client:1.23.0
, see https://cloud.google.com/dataflow/docs/concepts/sdk-worker-dependencies. Is this still an issue?
The new Google API Client Library, version 1.23.0, appears to cause problems with the Dataflow Java SDK when submitting and/or running jobs.
This appears to affect Dataflow Java SDKs in both major version families (e.g. 1.9.1, 2.0.0, and 2.1.0)
In some cases, these problems manifest as
404 HTTP
errors when attempting to upload staging filesWorkaround: Pinning Google API Client Library dependencies to version 1.22.0 appears to avoid this issue
com.google.api-client:google-api-client:1.22.0
Gradle Example:
Maven Example: