apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.88k stars 4.27k forks source link

[Bug]: beam.io.WriteToBigQuery failed when given schema with space #25704

Open jwzh222 opened 1 year ago

jwzh222 commented 1 year ago

What happened?

there is any issue in python SDK beam.io.WriteToBigQuery() when you add a space in schema, like schema="name: STRING", it will fail.

error message: "message": "Invalid value for type: STRING is not a valid value"

example code:

`import apache_beam as beam from apache_beam.options.pipeline_options import PipelineOptions

def run(): pipeline_args = [] pipeline_options = PipelineOptions(pipeline_args)

table_ref = 'project_id:dataset_id.table_id'
schema_with_space = "name: STRING"
schema_without_space = "name:STRING"

with beam.Pipeline(options=pipeline_options) as p :
    records = p | 'load records' >> beam.Create([{"name":"bob"},{"name":"alice"}])
    records | 'write to bigquery' >> beam.io.WriteToBigQuery(
        table = table_ref,
        schema = schema_with_space,
        create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED
    )

if name == 'main': pipeline_args = [ '--runner', 'DirectRunner', '--project', 'YOUR_PROJECT_ID', ]

run()`

Issue Priority

Priority: 3 (minor)

Issue Components

jwzh222 commented 1 year ago

environment: windows 10 python 3.9.0 apache-beam 2.45.0

it will fail with both DirectRunner and DataflowRunner

.add-labels DirectRunner

github-actions[bot] commented 1 year ago

Label DirectRunner cannot be managed because it does not exist in the repo. Please check your spelling.

Abacn commented 1 year ago

Thanks for reporting this issue. It looks like indeed a bug that space is not considered here:

https://github.com/apache/beam/blob/6452dc7982240819a763aaf9ff3efc4a01fc1d2b/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L1536

Use (s.strip() for s in field_and_type.split(':')) as L1534 should fix the problem

Would you interested in getting a fix?