apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.81k stars 4.23k forks source link

[Bug]: Python ReadFromMongoDB/RangeTracker raises OverflowError #26492

Open akv-mshin opened 1 year ago

akv-mshin commented 1 year ago

What happened?

Our dataflow job started to fail few days ago (apache beam 2.44) without any changes. After investigating logs and execution details I came to conclusion that the issue is with a part of code that is related to ReadFromMongoDB:

Error message from worker: Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/apache_beam/runners/worker/sdk_worker.py", line 287, in _execute
    response = task()
  File "/usr/local/lib/python3.8/site-packages/apache_beam/runners/worker/sdk_worker.py", line 360, in <lambda>
    lambda: self.create_worker().do_instruction(request), request)
  File "/usr/local/lib/python3.8/site-packages/apache_beam/runners/worker/sdk_worker.py", line 596, in do_instruction
    return getattr(self, request_type)(
  File "/usr/local/lib/python3.8/site-packages/apache_beam/runners/worker/sdk_worker.py", line 634, in process_bundle
    bundle_processor.process_bundle(instruction_id))
  File "/usr/local/lib/python3.8/site-packages/apache_beam/runners/worker/bundle_processor.py", line 1003, in process_bundle
    input_op_by_transform_id[element.transform_id].process_encoded(
  File "/usr/local/lib/python3.8/site-packages/apache_beam/runners/worker/bundle_processor.py", line 227, in process_encoded
    self.output(decoded_value)
  File "apache_beam/runners/worker/operations.py", line 526, in apache_beam.runners.worker.operations.Operation.output
  File "apache_beam/runners/worker/operations.py", line 528, in apache_beam.runners.worker.operations.Operation.output
  File "apache_beam/runners/worker/operations.py", line 237, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 1021, in apache_beam.runners.worker.operations.SdfProcessSizedElements.process
  File "apache_beam/runners/worker/operations.py", line 1030, in apache_beam.runners.worker.operations.SdfProcessSizedElements.process
  File "apache_beam/runners/common.py", line 1432, in apache_beam.runners.common.DoFnRunner.process_with_sized_restriction
  File "apache_beam/runners/common.py", line 817, in apache_beam.runners.common.PerWindowInvoker.invoke_process
  File "apache_beam/runners/common.py", line 988, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
  File "/usr/local/lib/python3.8/site-packages/apache_beam/runners/sdf_utils.py", line 111, in check_done
    return self._restriction_tracker.check_done()
  File "/usr/local/lib/python3.8/site-packages/apache_beam/io/iobase.py", line 1590, in check_done
    return self.restriction.range_tracker().fraction_consumed() >= 1.0
  File "/usr/local/lib/python3.8/site-packages/apache_beam/io/range_trackers.py", line 285, in fraction_consumed
    return self.position_to_fraction(
  File "/usr/local/lib/python3.8/site-packages/apache_beam/io/range_trackers.py", line 432, in position_to_fraction
    return float(ikey - istart) / (iend - istart)
OverflowError: int too large to convert to float

Code:

pipeline | ReadFromMongoDB(uri=..., db=..., coll=..., bucket_auto=True)

Looks like RangeTracker that is used underneath makes the code fail (ikey is probably choosen as a too large number).

The current collection size is 221418 elements which should not be a problem for float capacity.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

akv-mshin commented 1 year ago

Also just wanted to mention that the same code in different environment succeeds (it has different number of items in mongo collection).

Also the same code and the same pipeline and environment worked just a few days ago.

So it really should be the issue with the number of items in the mongo collection (And the Error place also suggests that too)

akv-mshin commented 1 year ago

Just verified that I see the same issue on 2.46