When ingesting tables that contain set fields in MySQL tables dlt fails to convert them to arrow, due to the type not being supported by PyArrow.
File "/path/dlt/common/libs/pyarrow.py", line 685, in row_tuples_to_arrow
return pa.Table.from_pydict(columnar_known_types, schema=arrow_schema)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/table.pxi", line 1920, in pyarrow.lib._Tabular.from_pydict
File "pyarrow/table.pxi", line 6153, in pyarrow.lib._from_pydict
File "pyarrow/array.pxi", line 398, in pyarrow.lib.asarray
File "pyarrow/array.pxi", line 358, in pyarrow.lib.array
File "pyarrow/array.pxi", line 85, in pyarrow.lib._ndarray_to_array
File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: Expected bytes, got a 'set' object
The same problem exists for lists and dicts as well, but the corresponding code handles those by casting them to string. It seems like set types are missed.
Expected behavior
MySQL set fields are correctly ingested.
Steps to reproduce
try ingesting a table that contains a set using pyarrow backend
create table test.some_table
(
order_id int auto_increment primary key,
col1 set ('0', '1') default '0' not null,
col2 set ('1', '2') default '2' not null
)
Operating system
Linux, macOS, Windows
Runtime environment
Local
Python version
3.11
dlt data source
sql_table
dlt destination
Google BigQuery, DuckDB, Filesystem & buckets, Postgres, Amazon Redshift, Snowflake
Other deployment details
No response
Additional information
the only workaround at the moment is not using pyarrow. I am submitting a fix for this at the moment.
dlt version
1.4.0
Describe the problem
When ingesting tables that contain set fields in MySQL tables dlt fails to convert them to arrow, due to the type not being supported by PyArrow.
The same problem exists for lists and dicts as well, but the corresponding code handles those by casting them to string. It seems like set types are missed.
Expected behavior
MySQL set fields are correctly ingested.
Steps to reproduce
try ingesting a table that contains a set using pyarrow backend
Operating system
Linux, macOS, Windows
Runtime environment
Local
Python version
3.11
dlt data source
sql_table
dlt destination
Google BigQuery, DuckDB, Filesystem & buckets, Postgres, Amazon Redshift, Snowflake
Other deployment details
No response
Additional information
the only workaround at the moment is not using pyarrow. I am submitting a fix for this at the moment.