Open github-actions[bot] opened 4 months ago
Related to ##30447
Still failing:
Container image gcr.io/cloud-dataflow/v1beta3/beam_java8_sdk:beam-master-20240306 not downloaded yet.
It is strange that the container gets resolved to "beam_java8_sdk:beam-master-20240306". What happens is it picks the label for legacy runner but actually trying to pull runner v2 image. This is likely due to Dataflow switched to runner v2 by default in Beam 2.55.0+
entered #30634
https://github.com/apache/beam/actions/runs/8619063045
java.lang.RuntimeException: com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found
POST https://bigquery.googleapis.com/bigquery/v2/projects/apache-beam-testing/datasets/beam_postrelease_mobile_gaming/tables/leaderboard_DataflowRunner_team/insertAll?prettyPrint=false
{
"code" : 404,
"errors" : [ {
"domain" : "global",
"message" : "Not found: Table apache-beam-testing:beam_postrelease_mobile_gaming.leaderboard_DataflowRunner_team",
"reason" : "notFound"
} ],
"message" : "Not found: Table apache-beam-testing:beam_postrelease_mobile_gaming.leaderboard_DataflowRunner_team",
"status" : "NOT_FOUND"
}
Looks much better. Close this now.
Currently there is a flakiness due to downloading artifacts from maven snapshot repository not get retried. This is a maven tool thing, but probably we can first build (with retry) so the artifacts are get cached in local maven
@shunping please check this when you have time.
Related to the maven snapshot issue. I wonder if we could use artifact registry's ability to store Java packages https://cloud.google.com/artifact-registry/docs/java/store-java, instead of relying on maven central.
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.6.0:java (default-cli) on project word-count-beam: An exception occured while executing the Java class. java.lang.RuntimeException: com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found |
-- | --
| [ERROR] POST https://bigquery.googleapis.com/bigquery/v2/projects/apache-beam-testing/datasets/beam_postrelease_mobile_gaming/tables/leaderboard_DirectRunner_team/insertAll?prettyPrint=false |
| [ERROR] { |
| [ERROR] "code" : 404, |
| [ERROR] "errors" : [ { |
| [ERROR] "domain" : "global", |
| [ERROR] "message" : "Not found: table Table is deleted: 844138762903:beam_postrelease_mobile_gaming.leaderboard_DirectRunner_team", |
| [ERROR] "reason" : "notFound" |
| [ERROR] } ], |
| [ERROR] "message" : "Not found: table Table is deleted: 844138762903:beam_postrelease_mobile_gaming.leaderboard_DirectRunner_team", |
| [ERROR] "status" : "NOT_FOUND" |
| [ERROR] } |
| [ERROR] -> [Help 1] |
| [ERROR]
Can we just add the retry to this task?
Looking at some of the recent failures seems like Java command was just crashing ?
https://github.com/apache/beam/actions/runs/9537373049/job/26285395593 https://ge.apache.org/s/pmba6vnub3yz4
"Process 'command '/opt/hostedtoolcache/Java_Temurin-Hotspot_jdk/8.0.412-8/x64/bin/java'' finished with non-zero exit value 1"
I also see the 404 error from BQ mentioned above in other failed runs, so seems like there are at least two failure modes.
I wonder if Java failure was due to an OOM. Can we increase the memory available to VMs running these tests ?
Trying this with #31749
The PostRelease Nightly Snapshot is failing over 50% of the time Please visit https://github.com/apache/beam/actions/workflows/beam_PostRelease_NightlySnapshot.yml?query=is%3Afailure+branch%3Amaster to see the logs.