coiled / benchmarks

BSD 3-Clause "New" or "Revised" License
32 stars 17 forks source link

Release 0.2.0 #526

Closed jrbourbeau closed 1 year ago

jrbourbeau commented 1 year ago

https://github.com/coiled/coiled-runtime/pull/488 bumped package versions in preparation for a new runtime release. Given https://github.com/coiled/coiled-runtime/issues/520 we should try to release ASAP. @ncclementi are you aware of any outstanding issues?

ncclementi commented 1 year ago

Yes, we have a couple of problems:

  1. We are seeing this constantly https://github.com/coiled/coiled-runtime/issues/468
  2. There are valid regressions and we do not know where do they come from see https://github.com/coiled/coiled-runtime/issues/500 more specifically https://github.com/coiled/coiled-runtime/issues/500#issuecomment-1314505305
ncclementi commented 1 year ago

Based on the comment above plan of action:

  1. Do a patch release (0.1.1) of runtime pinning bokeh to avoid dashboard problems.
  2. Keep studying problems to bump all the packages for release 0.2.0 see https://github.com/coiled/coiled-runtime/issues/500 and https://github.com/coiled/coiled-runtime/issues/468
jrbourbeau commented 1 year ago

That sounds like a good plan to me. I'd suggest pinning bokeh=2.4.3

ncclementi commented 1 year ago

After talking to @hayesgb we decided:

  1. Run A/B testing of 0.1.1 vs packages for the 0.2.0: Currently running https://github.com/coiled/coiled-runtime/actions/runs/3585834430
  2. If things look ok, proceed with the release, if not then we need to investigate further.
ncclementi commented 1 year ago

I ran the A/B testing between 0.1.1 and what would be 0.2.0 (excluding upgrade of xgboost as conda vs pip version cause a conflict.) I ran 5 retries (one of the baselines ) Here is the link to download the plots. https://github.com/coiled/coiled-runtime/actions/runs/3585834430

The updated package list is:

      - dask==2022.11.1
      - distributed==2022.11.1
      - pyarrow==9.0.0
      - numpy==1.23.5
      - pandas==1.5.2
      - fsspec==2022.11.0
      - s3fs==2022.11.0
      - gcsfs==2022.11.0
      - jupyterlab==3.5.0
      - dask-labextension==6.0.0
      - scikit-learn==1.1.3
      - xarray==2022.11.0
      - zarr==2.13.3
      - msgpack==1.0.4
      - cloudpickle==2.2.0
      - toolz==0.12.0
      - lz4==4.0.2
      - ipywidgets==7.7.2

Although, numpy seems to get downgraded back

Downgrade:
─────────────────────────────────────────────────────────────────────────────────────────────────────

  - numpy                                1.23.5  py39h3d75532_0      conda-forge                   
  + numpy                                1.21.6  py39h18676bf_0      conda-forge/linux-64       6MB
  - openssl                               3.0.7  h166bdaf_0          conda-forge                   
  + openssl                              1.1.1s  h166bdaf_0          conda-forge/linux-64       2MB

But in the meantime, this is wall clock:

Screen Shot 2022-11-30 at 3 51 57 PM

this is Average Memory:

Screen Shot 2022-11-30 at 3 52 35 PM

For reference, here is the null hypothesis wall-time Screen Shot 2022-11-30 at 4 03 55 PM

Avg-memory Screen Shot 2022-11-30 at 4 04 30 PM

cc: @hayesgb

ntabris commented 1 year ago

this means that, e.g., test shuffle got ~30% slower (with high degree of confidence)?

hayesgb commented 1 year ago

@ncclementi

As we discussed -- running another A/B test that looks at bumping everything excepting the fsspec packages and then making a decision seems OK.

Let's also confirm you can successfully complete a conda build with both of the build candidate, and the one you intend to test.

Let's aim for a decision by midday tomorrow.

If pyarrow==10.0.xmakes its way onto condo-forge in early December (as expected) we can do a patch release.

gjoseph92 commented 1 year ago

Shuffle specifically we can expect to be slower (30% sounds about right to me) with queuing on by default.

If we're worried about versions of (transitive) dependencies having affected performance, @ncclementi maybe you should run the same A/B test, but turn off queuing for 0.2.0. If that looks the same/better, then we can attribute the wall-clock difference to queuing.

ncclementi commented 1 year ago

Here is the link to the AB test with no queuing: https://github.com/coiled/coiled-runtime/actions/runs/3588057957

Wall time Screen Shot 2022-11-30 at 8 38 16 PM

Memory Screen Shot 2022-11-30 at 8 39 23 PM

gjoseph92 commented 1 year ago

Hmm, so it looks like some changes are definitely not due to queuing. The H2O q9 change seems like the most important to investigate, but I wonder why all of H2O seems a little slower consistently.

ncclementi commented 1 year ago

I was looking into the specific case of q9 and I think the regression we see between 0.1.1 and the new 0.2.0 was introduced by dask/distributed at some point

Notice that on latest q9-5GB was almost always averaging between 7-10 s (always using 2022.6.0), except the big bump when the version of dask was upgraded. Screen Shot 2022-12-01 at 12 00 26 PM

Now if look at upstream the earliest values we have are for 2022.7.01 and it's already higher by ~50% and this continues pretty consistently. See figures below

2022..7.1

Screen Shot 2022-12-01 at 12 04 33 PM

2022.9.2 Screen Shot 2022-12-01 at 12 03 36 PM

We were back to the good times on 2022.10.0 but then back again to the regression. Screen Shot 2022-12-01 at 12 05 00 PM

In conclusion, this regression is being with us since 2022.7.1, it is probably worth investigating but it shouldn't block the release given all the other benefits that come with newer versions of dask.

ncclementi commented 1 year ago

As we discussed -- running another A/B test that looks at bumping everything excepting the fsspec packages and then making a decision seems OK.

The results of this experiment did not have anything surprising. The behavior is the exact same as including the updated version of all the packages.

Let's also confirm you can successfully complete a conda build with both of the build candidate, and the one you intend to test.

I was able to do this locally with no problems.

Let's aim for a decision by midday tomorrow. For regressions discussion see https://github.com/coiled/coiled-runtime/issues/526#issuecomment-1334094911 Got a PR open to kick this in https://github.com/coiled/coiled-runtime/pull/570

cc: @hayesgb

hayesgb commented 1 year ago

This looks good to me. I would move forward with the newer release, and create an issue to investigate the Q9 issue separately.

ncclementi commented 1 year ago

As of last night we have a pypi and conda-forge 0.2.0 release.

I will close this issue as soon as I merged the following two PR