apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.7k stars 4.2k forks source link

[Failing Test]: beam_PreCommit_Java_GCP_IO_Direct high possibility timeout #25207

Open Abacn opened 1 year ago

Abacn commented 1 year ago

What happened?

When not timing out, the test takes ~30-40 min; or it timing out after 2 hour: https://ci-beam.apache.org/job/beam_PreCommit_Java_GCP_IO_Direct_Cron/

From the log, in the cases of timeout, the task gets stuck early:

17:53:23 > Task :sdks:java:io:google-cloud-platform:expansion-service:distZip
17:53:48 > Task :sdks:java:io:google-cloud-platform:test
17:53:53 > Task :sdks:java:io:google-cloud-platform:expansion-service:shadowJar
17:54:03 > Task :sdks:java:io:google-cloud-platform:expansion-service:startShadowScripts
17:54:03 > Task :sdks:java:io:google-cloud-platform:expansion-service:shadowDistTar
17:54:11 > Task :sdks:java:io:google-cloud-platform:expansion-service:shadowDistZip
17:54:11 > Task :sdks:java:io:google-cloud-platform:expansion-service:assemble
17:54:11 > Task :sdks:java:io:google-cloud-platform:expansion-service:analyzeClassesDependencies
17:54:11 > Task :sdks:java:io:google-cloud-platform:expansion-service:analyzeDependencies
17:54:11 > Task :sdks:java:io:google-cloud-platform:expansion-service:test NO-SOURCE
17:54:11 > Task :sdks:java:io:google-cloud-platform:expansion-service:check
17:54:11 > Task :sdks:java:io:google-cloud-platform:expansion-service:build
19:52:33 Build timed out (after 120 minutes). Marking the build as aborted.

Particularly, :sdks:java:io:google-cloud-platform:test is executing but :sdks:java:io:google-cloud-platform:integrationTest not yet started. Likely some unit test having race conditions.

Issue Failure

Failure: Test is flaky

Issue Priority

Priority: 2 (backlog / disabled test but we think the product is healthy)

Issue Components

Abacn commented 1 year ago

Found one timeout flaky test: org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOWriteTest.testTriggeredFileLoadsWithTempTablesToExistingNullSchemaTable

Update: the flakiness may due to #25211

Abacn commented 1 year ago

Still occurring: https://ci-beam.apache.org/view/PostCommit/job/beam_PreCommit_Java_GCP_IO_Direct_Cron/864/ org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOWriteTest.testTriggeredFileLoadsWithTempTablesToExistingNullSchemaTable[1] timeout after 10 min.

Though less frequent.

Abacn commented 1 year ago

See a log message

NFO: Ignoring failed deletion of file /tmp/junit1629866381724064800/files00010 which already does not exist.
java.nio.file.NoSuchFileException: /tmp/junit1629866381724064800/files00010

may be conflicting with other tests that deleted this file, and causing pipeline stuck

Abacn commented 1 year ago

reopen this for tracking the (now flaky) timeout test testTriggeredFileLoadsWithTempTablesToExistingNullSchemaTable[1]