Open ahmedabu98 opened 1 year ago
JSON type is also missing in Python SDK 😕
Hi @ahmedabu98 , currently GEOGRAPHY
as a data type isn't supported, and it throws error here: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L110-L123
Are there plans to add support for it?
I'd also like to bump this as needed for using WriteToBigQuery
in Python:
Google recommends using the STORAGE_WRITE_API
method in their Dataflow Best Practices, which requires passing this transform the schema
argument for a table. But since many of our BigQuery tables have a DATE
or DATETIME
column, which isn't supported yet for these schemas in Python, we aren't able to use this.
As of Beam 2.60.0, we haven't found a current workaround - e.g. specifying our
DATE
columns asTIMESTAMP
in the Python schema seems to fail either when Beam tries to actually write to BigQuery, or at some point when the Java code is executing and doing its own conversion. If anyone knows a workaround for this, I'd appreciate it.
As a side-note: why does STORAGE_WRITE_API
require specifying a schema in advance, while STREAMING_INSERT
does not?
What needs to happen?
Beam portable schemas include primitive and more complex types (represented as logical types). Some of these types are supported in the Python SDK: https://github.com/apache/beam/blob/99202b237e364bf77f40b6da0ec22cb7b17c37d0/sdks/python/apache_beam/typehints/schemas.py#L23-L41
When necessary, Python classes are created to represent a portable type. For example, see Timestamp below: https://github.com/apache/beam/blob/99202b237e364bf77f40b6da0ec22cb7b17c37d0/sdks/python/apache_beam/utils/timestamp.py#L45
There are some missing portable types in the Python SDK (e.g. Date, DateTime, Time) that we should add support for to make the cross-language experience more smooth.
Issue Priority
Priority: 2 (default / most normal work should be filed as P2)
Issue Components