I am using pandas 2.0.0 (also happens with higher version of pandas) and performing SQL query using %%sparksql requesting datetime64 data. I encounter this error when performing basic query such as
%%sparksql select * from db.table
The error :
Casting to unit-less dtype 'datetime64' is not supported. Pass e.g. 'datetime64[ns]' instead.
The work around is to downgrade pandas. It could be solving the issue to have a converter when parsing a datetime series :
I am using pandas 2.0.0 (also happens with higher version of pandas) and performing SQL query using %%sparksql requesting datetime64 data. I encounter this error when performing basic query such as
%%sparksql select * from db.table
The error :
Casting to unit-less dtype 'datetime64' is not supported. Pass e.g. 'datetime64[ns]' instead.
The work around is to downgrade pandas. It could be solving the issue to have a converter when parsing a datetime series :
`from pyspark.sql.types import TimestampType
def correct_dtype(pser: pd.Series) -> pd.Series: if pd.api.types.is_datetime64_any_dtype(pser): return pser.astype('datetime64[ns]', copy=False) return pser.astype(pandas_type, copy=False)`
If this is not a behavior you wan't, maybe consider adding an upper limit on the pandas version in the pyproject.toml
"pandas<2.0.0"
Complete Stacktrace :