[Bug]: apache_beam.dataframe.convert.to_pcollection() fails on deferred dataframe of csv with only header row

What happened?

After reading a csv file containing only the header row using apache_beam.dataframe.io.read_csv, the to_pcollection method fails with the following error: OverflowError: cannot convert float infinity to integer [while running 'Unbatch 'placeholder_DataFrame_6102733264''].

Please use the python code below to reproduce this issue

import logging
import argparse
import sys
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions, SetupOptions
from apache_beam.dataframe.io import read_csv
from apache_beam.dataframe import convert

logging.getLogger().setLevel(logging.INFO)

parser = argparse.ArgumentParser()
known_args, pipeline_args = parser.parse_known_args(sys.argv)
pipeline_options = PipelineOptions(pipeline_args)
pipeline_options.view_as(SetupOptions).save_main_session = True

p = beam.Pipeline(options=pipeline_options)

beam_df = p | "Read csv" >> read_csv("only_header.csv")

pcol1 = (
    # Convert the Beam DataFrame to a PCollection.
    convert.to_pcollection(beam_df)
    | beam.Map(print)
)

p.run().wait_until_finish()

To reiterate, in this example the file only_header.csv only has the header rows.

This issue appeared while attempting to migrate from version 2.41.0 to 2.51.0, but confirmed it also appears with 2.54.0.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

[X] Component: Python SDK
[ ] Component: Java SDK
[ ] Component: Go SDK
[ ] Component: Typescript SDK
[ ] Component: IO connector
[ ] Component: Beam YAML
[ ] Component: Beam examples
[ ] Component: Beam playground
[ ] Component: Beam katas
[ ] Component: Website
[ ] Component: Spark Runner
[ ] Component: Flink Runner
[ ] Component: Samza Runner
[ ] Component: Twister2 Runner
[ ] Component: Hazelcast Jet Runner
[ ] Component: Google Cloud Dataflow Runner

apache / beam