CI Failing on windows 3.11

mrocklin commented 11 months ago

https://github.com/dask/dask/actions/runs/7094083759/job/19308695802?pr=10669

____________________________ test_setitem_hardmask ____________________________
[gw0] win32 -- Python 3.11.6 C:\Miniconda3\envs\test-environment\python.exe

    @pytest.mark.xfail(
        sys.platform == "win32" and PY_VERSION >= Version("3.12.0"),
        reason="https://github.com/dask/dask/issues/10604",
    )
    def test_setitem_hardmask():
        x = np.ma.array([1, 2, 3, 4], dtype=int)
        x.harden_mask()

        y = x.copy()
        assert y.hardmask

        x[0] = np.ma.masked
        x[0:2] = np.ma.masked

        dx = da.from_array(y)
        dx[0] = np.ma.masked
        dx[0:2] = np.ma.masked

>       assert_eq(x, dx)

dask\array\tests\test_array_core.py:3934: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
dask\array\utils.py:315: in assert_eq
    b, bdt, b_meta, b_computed = _get_dt_meta_computed(
dask\array\utils.py:262: in _get_dt_meta_computed
    _check_dsk(x.dask)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

dsk = HighLevelGraph with 3 layers.
<dask.highlevelgraph.HighLevelGraph object at 0x25500b1fad0>
 0. array-77780c9fdd36e1630360e1a58a2f798c
 1. setitem-3b3740c2d6f44f9f55717df19bf08928
 2. setitem-1fd1e642f60c88ef4752333bec2c3e00

    def _check_dsk(dsk):
        """Check that graph is well named and non-overlapping"""
        if not isinstance(dsk, HighLevelGraph):
            return

        dsk.validate()
        assert all(isinstance(k, (tuple, str)) for k in dsk.layers)
        freqs = frequencies(concat(dsk.layers.values()))
        non_one = {k: v for k, v in freqs.items() if v != 1}
>       assert not non_one, non_one
E       AssertionError: {('array-ca2bfc119ac07031ee36eb791623951f',): 2}
E       assert not {('array-ca2bfc119ac07031ee36eb791623951f',): 2}

dask\array\utils.py:213: AssertionError
_____________________________ test_describe_empty _____________________________
[gw2] win32 -- Python 3.11.6 C:\Miniconda3\envs\test-environment\python.exe

    def test_describe_empty():
        df_none = pd.DataFrame({"A": [None, None]})
        ddf_none = dd.from_pandas(df_none, 2)
        df_len0 = pd.DataFrame({"A": [], "B": []})
        ddf_len0 = dd.from_pandas(df_len0, 2)
        ddf_nocols = dd.from_pandas(pd.DataFrame({}), 2)

        # Pandas have different dtypes for resulting describe dataframe if there are only
        # None-values, pre-compute dask df to bypass _meta check
        assert_eq(
            df_none.describe(), ddf_none.describe(percentiles_method="dask").compute()
        )

>       with pytest.warns(RuntimeWarning):
E       Failed: DID NOT WARN. No warnings of type (<class 'RuntimeWarning'>,) were emitted.
E       The list of emitted warnings is: [].

dask\dataframe\tests\test_dataframe.py:607: Failed

hendrikmakait commented 11 months ago

During yesterday's maintainers sync, @rjzamora and @charlesbluca volunteered to investigate this.

rjzamora commented 11 months ago

Just a note that this seems similar to https://github.com/dask/dask/issues/10604 (which is currently being ignored for Python-3.12).

charlesbluca commented 11 months ago

Looking at the past few workflow runs, it seems like test_describe_empty fails much less consistently than test_setitem_hardmask, which should be addressed (along with #10604) in #10701 - are we okay with marking the test as flaky for now on Windows?

mrocklin commented 11 months ago

Checking in. What's the status here? @charlesbluca do I take it from your comment above that you're not able to resolve this without someone else (James maybe?) getting involved?

rjzamora commented 11 months ago

Checking in. What's the status here? @charlesbluca do I take it from your comment above that you're not able to resolve this without someone else (James maybe?) getting involved?

This Issue is reporting two distinct CI failures. The first failure was showing up consistently in CI, and has now addressed/closed by https://github.com/dask/dask/pull/10701. The second failure seems to be more of a flake (it does not show up often in CI, and Charles could not reproduce it locally). I'm sure Charles and I can resolve the second failure if it becomes a persistent problem, but I'd rather not prioritize it immediately.

mrocklin commented 11 months ago

The second failure seems to be more of a flake

To be clear, I'd like for flaky CI to really scare us. My experience is that CI that is sometimes red causes people to mostly disregard signals in CI, which causes more things to leak into CI. dask/distributed fell into this trap when I stopped actively maintaining it and I think it has caused hundreds of hours of lost human time. I'd be sad if dask/dask fell into this trap as well. (although of course the concurrent nature of dask/distributed makes it harder over there than here; I'm hopeful that we can do a better job in dask/dask where it's easier).

mrocklin commented 11 months ago

(not to say that this is strictly on you @rjzamora, it's not)

rjzamora commented 11 months ago

To be clear, I'd like for flaky CI to really scare us.

Yes agree. Sorry to make it sound like we shouldn't care about the second CI failure. Ignoring flaky CI is indeed dangerous!

What I'm trying to communicate is that we were able prioritize/investigate the CI-blocking problem before leaving for holiday time off, but we probably won't be able available to address the flaky failure for the next few weeks. (also, I'm sorry to be speaking for Charles, but he will be away until January).

mrocklin commented 11 months ago

Thanks for the communication. We'll take care of it.

On Mon, Dec 18, 2023 at 8:02 AM Richard (Rick) Zamora < @.***> wrote:

To be clear, I'd like for flaky CI to really scare us.

Yes agree. Sorry to make it sound like we shouldn't care about the second CI failure. Ignoring flaky CI is indeed dangerous!

What I'm trying to communicate is that we were able prioritize/investigate the CI-blocking problem before leaving for holiday time off, but we probably won't be able available to address the flaky failure for the next few weeks. (also, I'm sorry to be speaking for Charles, but he will be away until January).

— Reply to this email directly, view it on GitHub https://github.com/dask/dask/issues/10672#issuecomment-1860903592, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTCAKU7K6AM6EQ6DZXTYKBSKRAVCNFSM6AAAAABAIDPKQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRQHEYDGNJZGI . You are receiving this because you authored the thread.Message ID: @.***>

dask / dask

CI Failing on windows 3.11 #10672