exasol / pyexasol

Exasol Python driver with low overhead, fast HTTP transport and compression
MIT License
71 stars 39 forks source link

export_to_pandas converts zero padded strings to decimals #78

Closed ghost closed 3 years ago

ghost commented 3 years ago

I'm using export_to_pandas on a view that has strings like '0012', i.e. zero padded integers. These are keys in data provided by someone else that I need to join against, so I need it to remain as a zero padded string. However, for some reason, export_to_pandas does convert these to integers, i.e. the string '0012' becomes the integer 12. Is there any way to stop this from happening?

littleK0i commented 3 years ago

Hello.

It's a standard pandas behaviour when reading data from CSV. Pandas infers datatypes of columns automatically.

You can change it by using callback_params argument for export_to_pandas(): https://github.com/badoo/pyexasol/blob/master/pyexasol/connection.py#L272

You can pass any argument available for pandas.read_csv() method: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

See the dtype argument. If you pass str for affected column, it should prevent conversion to integer and keep leading zeroes.

An example of usage of custom parameters is available here: https://github.com/badoo/pyexasol/blob/master/pyexasol/ext.py#L264

ghost commented 3 years ago

Hi wildraid,

thanks for the explanation and info about how to solve this!