Open cozos opened 1 year ago
cc: @tvalentyn
A similar issue was reported before: https://github.com/apache/beam/issues/20775 but resolved in https://github.com/apache/beam/pull/13526/
Note that https://github.com/apache/beam/pull/13526 is included in v2.35.0 - see: https://github.com/apache/beam/blame/v2.35.0/sdks/python/apache_beam/metrics/execution.py#L241
I agree this should remain P1.
@cozos Thanks for reporting! Do you by chance have a repro that you could share? It would allow people looking into this issue to debug independently. Thank you.
I will try to create a minimal reproducible example
Thank you. Much appreciated.
Any luck created a reproduction? I am not sure this kind of race condition could cause data loss or not.
What happened?
Hello, I am on Apache Beam v2.35.0 running on GCP Dataflow, and I've encountered what I believe are race conditions in the progress reporting machinery (i.e.
process_bundle_progress
orProcessBundleProgressRequest
):I am running long running C++ code through pybind11 which I think might be a contributing factor. However my C++ code does not access any Python objects without holding the GIL and definitely doesn't change anything related to progress reporting.
I am marking this as P1 because I assume race conditions can cause data loss, etc - sorry if this is inappropriate and feel free to change it.
Issue Priority
Priority: 1 (data loss / total loss of function)
Issue Components