apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.9k stars 4.27k forks source link

The PostCommit XVR Direct job is flaky #30517

Open github-actions[bot] opened 9 months ago

github-actions[bot] commented 9 months ago

The PostCommit XVR Direct is failing over 50% of the time Please visit https://github.com/apache/beam/actions/workflows/beam_PostCommit_XVR_Direct.yml?query=is%3Afailure+branch%3Amaster to see the logs.

github-actions[bot] commented 3 months ago

Reopening since the workflow is still flaky

tvalentyn commented 3 months ago

Seems to be permared for a while.

2024-08-29T17:46:16.8649631Z System Go installation: /usr/local/go/bin/go is go version go1.21.0 linux/amd64; Preparing to use /home/runner/go/bin/go1.22.5
2024-08-29T17:46:17.0648275Z go1.22.5: already downloaded in /home/runner/sdk/go1.22.5
2024-08-29T17:46:17.0665947Z /home/runner/go/bin/go1.22.5 test -v ./test/integration/xlang ./test/integration/io/xlang/... -p 3 -v -timeout 3h --runner=portable --project=apache-beam-testing --region=us-central1 --environment_type=DOCKER --environment_config=apache/beam_go_sdk:dev --staging_location=gs://temp-storage-for-end-to-end-tests/staging-validatesrunner-test/test10288 --temp_location=gs://temp-storage-for-end-to-end-tests/temp-validatesrunner-test/test10288 --endpoint=localhost:34069 --kafka_jar=/runner/_work/beam/beam/sdks/java/testing/kafka-service/build/libs/beam-sdks-java-testing-kafka-service-testKafkaService-2.60.0-SNAPSHOT.jar --expansion_jar=io:/runner/_work/beam/beam/sdks/java/io/expansion-service/build/libs/beam-sdks-java-io-expansion-service-2.60.0-SNAPSHOT.jar --expansion_jar=schemaio:/runner/_work/beam/beam/sdks/java/extensions/schemaio-expansion-service/build/libs/beam-sdks-java-extensions-schemaio-expansion-service-2.60.0-SNAPSHOT.jar --expansion_jar=debeziumio:/runner/_work/beam/beam/sdks/java/io/debezium/expansion-service/build/libs/beam-sdks-java-io-debezium-expansion-service-2.60.0-SNAPSHOT.jar --expansion_jar=gcpio:/runner/_work/beam/beam/sdks/java/io/google-cloud-platform/expansion-service/build/libs/beam-sdks-java-io-google-cloud-platform-expansion-service-2.60.0-SNAPSHOT.jar --bq_dataset=apache-beam-testing.beam_bigquery_io_test_temp --bt_instance=projects/apache-beam-testing/instances/beam-test --expansion_addr=test:localhost:39707
2024-08-29T17:46:17.0689704Z go: downloading cloud.google.com/go/bigtable v1.29.0
2024-08-29T17:46:17.0691189Z go: downloading github.com/lib/pq v1.10.9
2024-08-29T17:46:17.0693048Z go: downloading github.com/go-sql-driver/mysql v1.8.1
2024-08-29T17:46:17.1648532Z go: downloading filippo.io/edwards25519 v1.1.0
2024-08-29T17:46:17.2648855Z go: downloading go.opentelemetry.io/otel/sdk/metric v1.24.0
2024-08-29T17:46:17.2650985Z go: downloading cloud.google.com/go/monitoring v1.20.3
2024-08-29T17:46:17.2652702Z go: downloading go.opentelemetry.io/otel/sdk v1.24.0
2024-08-29T19:32:10.7892423Z ##[error]The operation was canceled.
2024-08-29T19:32:10.8228117Z ##[group]Run actions/upload-artifact@v4
2024-08-29T19:32:10.8229144Z with:
2024-08-29T19:32:10.8230291Z   name: JUnit Test Results
tvalentyn commented 3 months ago

looks like we have an xlang test that runs with a 3hr time limit, passes on 3.12, fails on 3.8 after timing out after 2.5 hrs

tvalentyn commented 2 months ago

The failing test is GoUsingJava xlang suite, it is not using Python ; test passes on Python 3.12 because the 3.12 suite excludes the GoUsingJava xlang variant since we only need to run it for one Python version. It appears that GoUsingJava xlang scenario not working on some runners is a known issue. cc: @Abacn @lostluck who can correct me if they disagree with the assessment.

lostluck commented 2 months ago

It's a known issue and it's also not a release blocker. The fact is we have spent very little time making Xlang for go robust and the people tasked with that move on. This is also not something that would be common for users, since they'd need to manually spin up the Python Portable runner.

Abacn commented 2 months ago

last time I checked this it was a few failing xlang tests, and now it's timing out, likely new issues accumulated, which is common for long permared tests unfortunately.

For the same reason agree to disable gousingjava part of the test, so other tasks can still be monitored

github-actions[bot] commented 2 months ago

Reopening since the workflow is still flaky

Abacn commented 2 months ago

pullLicense flakiness, fixed by #32626 , move to the next milestone for monitoring

damccorm commented 1 month ago

Seems like it is resolved - https://github.com/apache/beam/actions/workflows/beam_PostCommit_XVR_Direct.yml?query=branch%3Amaster

github-actions[bot] commented 1 month ago

Reopening since the workflow is still flaky