Open jrbourbeau opened 1 year ago
Transferring to fastparquet, but will keep you in the loop @jrbourbeau
(actually, I can't transfer, will duplicate)
Just transferred over
Regression due to https://github.com/dask/fastparquet/pull/893 @jbrockmendel
Note that the same tests did pass in fastparequet's CI: e.g. https://github.com/dask/fastparquet/actions/runs/6615631492/job/17968182303#step:6:83 Maybe we have different versions of pandas?
This surfaces a bug upstream that i'll work on. Fortunately its easy to work around here. in #893 instead of passing dt64 values pass int64 values to _from_sequence. That will also be more performant.
values = type(bvalues)._from_sequence(values.view("int64"), copy=False, dtype=bvalues.dtype)
?
I am puzzled why only this invocation of the same method would need this, but if you say so...
I am puzzled why only this invocation of the same method would need this, but if you say so...
You are not alone in this. The API design question from ages ago was: "when passing dt64 values and a pd.DatetimeTZDtype to DatetimeIndex (which has the same behavior as _from_sequence here), do we interpret them as wall-times or UTC times?" We eventually landed on wall-times, while i8 values get interpeted as UTC times. wall times need to go through a cython function that converts the to UTC times. It is that cython function that is raising.
Dask CI continues to fail during this period. Should we xfail these tests in the meantime?
I believe a new fastparquet
release is imminent after https://github.com/dask/fastparquet/pull/899 is merged (though I don't object to xfail either)
I've seen
with tracebacks like this
showing up this morning on multiple PRs. See this CI build for full details.
Note all the errors involve
fastparquet
, which had a release yesterday. @martindurant any idea what might be happening here?