Open elderpinzon opened 8 months ago
As the error message suggests the issue is caused by a division by zero. I found a quick way to bypass this issue by modifying the following line: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/worker/opcounters.py#L225
As follows:
mean_element_size = self.producer_batch_converter.estimate_byte_size(
windowed_batch.values) / batch_length if batch_length !=0 else 0
I confirmed my test code above runs after this change and also confirmed that all the unit tests in that folder pass after the change (tried to run all tests in the runners
folder but a bunch failed due to gcloud authentication issues)
What happened?
After reading a csv file containing only the header row
using apache_beam.dataframe.io.read_csv
, theto_pcollection
method fails with the following error:OverflowError: cannot convert float infinity to integer [while running 'Unbatch 'placeholder_DataFrame_6102733264'']
.Please use the python code below to reproduce this issue
To reiterate, in this example the file
only_header.csv
only has the header rows.This issue appeared while attempting to migrate from version
2.41.0
to2.51.0
, but confirmed it also appears with2.54.0
.Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components