Open Abacn opened 10 months ago
.take-issue
@Abacn As far as I understand we have to add Timeout for TestPipeline. So I plan to pass this parameter while creation of TestPipeline class by initializing a varible self.duration = timeout
in testpipeline.__init_\(duration=None
) and pass this initialized variable to
state = result.wait_until_finish(duration=duration)
. But I am unable to understand how is this wait_until_finish() function is called, it would be helpful if I get some explaination regarding this.
Secondly, Is my understanding correct.
What needs to happen?
Currently TestPipeline run indefinitely: https://github.com/apache/beam/blob/aef21959bf6f41b4fb646ef06da97c8b8adbcb8d/sdks/python/apache_beam/testing/test_pipeline.py#L116
In the case the test timeout, it does not print useful information, just a pytest timeout message and the stacktrace where it gets interrupted (e.g. https://github.com/apache/beam/runs/19275621816)
However, DataflowRunner.wait_until_finish() indeed supports duration: https://github.com/apache/beam/blob/aef21959bf6f41b4fb646ef06da97c8b8adbcb8d/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L746
and when timeout, it prints the job id so one can find the Dataflow job to investigate: https://github.com/apache/beam/blob/aef21959bf6f41b4fb646ef06da97c8b8adbcb8d/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L769-L771
We should be able to use this functionality for TestPipeline, for example,
Issue Priority
Priority: 3 (nice-to-have improvement)
Issue Components