databricks / koalas

Koalas: pandas API on Apache Spark
Apache License 2.0
3.33k stars 356 forks source link

DataFrame.append causes unexpected dtype change in output DataFrame #2193

Open thehomebrewnerd opened 3 years ago

thehomebrewnerd commented 3 years ago

When appending a dataframe to another with df.append the column dtypes change unexpectedly at times. An example of this is below, showing that the boolean columns have been changed to bool. This same issue happens if the original dataframes are Int64 - the new dataframe will have those columns changed to int64.

I would expect that the output dtype would not change if the input dataframe dtypes were the same for a given column.

import pandas as pd
import databricks.koalas as ks

df1 = pd.DataFrame({'id': [0], 'val': pd.Series([True], dtype='boolean')})
df2 = pd.DataFrame({'id': [1], 'val': pd.Series([False], dtype='boolean')})
ks1 = ks.from_pandas(df1)
ks2 = ks.from_pandas(df2)
ks1.dtypes
id       int64
val    boolean
dtype: object
ks2.dtypes
id       int64
val    boolean
dtype: object
new_ks = ks1.append(ks2)
new_ks.dtypes
id     int64
val     bool
dtype: object