dask / distributed

A distributed task scheduler for Dask
https://distributed.dask.org
BSD 3-Clause "New" or "Revised" License
1.56k stars 712 forks source link

CI failing with pandas.errors.SettingWithCopyWarning #8591

Closed milesgranger closed 3 months ago

milesgranger commented 3 months ago

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy FAILED distributed/shuffle/tests/test_merge.py::test_merge[False-outer] - pandas.errors.SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

Example run: https://github.com/dask/distributed/actions/runs/8338285775/job/22818347973#step:20:4371

jrbourbeau commented 3 months ago

It looks like we're still having this failure on main. @fjetter @milesgranger any idea why that is? Maybe we're not installing dask-expr main? Maybe environment caching?

milesgranger commented 3 months ago

https://github.com/dask/distributed/actions/runs/8393801149/job/22989580530#step:20:34435

From looking at the logs, the line raising it doesn't reflect the change done to fix the issue. So must be a caching issue or something as you suspect.