https://github.com/pandas-dev/pandas/pull/52212 changed how pandas is inferring dtypes from scalars, which results in issues when we partition by a datetime (loading the data eagerly will return a datetime64[ns], loading the data as dask data frame will return a datetime64[s]). I don't think we have tests for this by the way. That we could address by explicitly setting the units to nanoseconds here.
Our CI jobs for pandas 2.0 are currently failing (see, e.g., here).
I see (at least) two issues with supporting pandas 2.0:
datetime64[ns]
, loading the data as dask data frame will return adatetime64[s]
). I don't think we have tests for this by the way. That we could address by explicitly setting the units to nanoseconds here.