dlt-hub / verified-sources

Contribute to dlt verified sources 🔥
https://dlthub.com/docs/walkthroughs/add-a-verified-source
Apache License 2.0
72 stars 50 forks source link

`sql_database` source with `pyarrow` backend does not support `json` typed columns from MySQL #543

Closed acaruso7 closed 3 months ago

acaruso7 commented 3 months ago

dlt version

0.5.1

Source name

sql_database

Describe the problem

Upon running a pipeline with a MySQL sql_table source and pyarrow backend, the extraction process fails on source tables with json typed columns. Stack trace:

Traceback (most recent call last):
  File "/path/to/project/sql_database/helpers.py", line 142, in _load_rows
    yield row_tuples_to_arrow(
  File "/path/to/project/sql_database/arrow_helpers.py", line 140, in row_tuples_to_arrow
    return pa.Table.from_pydict(columnar_known_types, schema=arrow_schema)
  File "pyarrow/table.pxi", line 1920, in pyarrow.lib._Tabular.from_pydict
  File "pyarrow/table.pxi", line 6153, in pyarrow.lib._from_pydict
  File "pyarrow/array.pxi", line 398, in pyarrow.lib.asarray
  File "pyarrow/array.pxi", line 358, in pyarrow.lib.array
  File "pyarrow/array.pxi", line 85, in pyarrow.lib._ndarray_to_array
  File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: Expected bytes, got a 'dict' object

Expected behavior

No response

Steps to reproduce

    table_source = sql_table(
        credentials=mysql_engine,
        table=table_name_with_json_columns,
        backend="pyarrow",
    )
    pipeline_run_result = pipeline.run(
        table_source, write_disposition="replace"
    )

How you are using the source?

I'm considering using this source in my work, but bug is preventing this.

Operating system

Linux

Runtime environment

Docker, Docker Compose

Python version

3.10.12

dlt destination

Snowflake

Additional information

No response

acaruso7 commented 3 months ago

This has been fixed by https://github.com/dlt-hub/verified-sources/pull/541