Pytest extension hangs forever after initialization

frgfm commented 3 weeks ago

Hey there :wave:

Thanks for the great work! I recently tried to add Codspeed to a project of mine following the tutorial on the documentation. Unfortunately, even though I followed the tutorial, the job step times out.

Here is the PR https://github.com/frgfm/torch-cam/pull/270 and the failed job I just canceled to avoid eating all my CI minutes https://github.com/frgfm/torch-cam/actions/runs/10514500036/job/29132464450?pr=270

Mostly, the edits that I did:

GitHub workflow

  benchmarks:
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
        os: [ubuntu-latest]
        python: [3.9]
    steps:
      - uses: actions/checkout@v4
        with:
          persist-credentials: false
      - uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python }}
          architecture: x64
      - name: Install dependencies
        run: |
          python -m pip install --upgrade uv
          uv pip install --system -e ".[test]"
      - name: Run benchmarks
        uses: CodSpeedHQ/action@v3
        with:
          token: ${{ secrets.CODSPEED_TOKEN }}
          run: pytest --codspeed tests/

and adding @pytest.mark.benchmark on a few tests that I run for coverage in another job.

Any hint on why this times out? :pray:

frgfm commented 2 weeks ago

@art049 @adriencaccia maybe?

adriencaccia commented 2 weeks ago

Hey @frgfm, can you try marking only a single test as a benchmark, the fastest one to execute preferably? That way we can make sure that it does work with a simple benchmark.

In CodSpeed, each benchmark is executed only once and the CPU behavior will be simulated, and that simulation is quite expensive, so long-running benchmarks can take a lot of time on the GitHub runner. My guess is that there might be a heavy benchmark that takes a bit of time to execute.

If it works in a reasonable amount of time for a simple benchmark, my advice is then to try and:

Only benchmark a code path once. Since the measure is precise, there is no need to have multiple benchmarks measuring the same thing
Reduce the size of the data in benchmarks with a lot of computing. Same reason as above

frgfm commented 2 weeks ago

Hey @adriencaccia,

I've just tried, narrowing it down on only a single test with the decorator and one of the fastest. So the problem is that this single test took 5min while my whole pytest suite for coverage took 1minute to run. Any suggestion on how to improve that?

5min would be my longest CI job, and the corresponding decorated test is not the most useful to benchmark. Have you seen people using Codspeed on PyTorch? (perhaps for some reasons, some of the dependencies aren't fairing well on your runners? :man_shrugging: )

PS: to be more specific, the CI step of the action takes 5min but the GH app reports that the single test takes 2s. So I imagine this is run multiple times but I don't understand how that bumps to 5min of execution

adriencaccia commented 1 week ago

The 2s showed in the CodSpeed GitHub comment and on the CodSpeed UI corresponds to the execution speed of the benchmark, which is not the same as the time it took to run this benchmark with the CodSpeed instrumentation.

Each benchmark is run once with the instrumentation, which adds some overhead. Hence for "macro-benchmarks" (benchmarks that take ~1s or more) the execution time with the CodSpeed instrumentation can take several minutes.

If that test is not that useful to benchmark, I would recommend skipping it in favor of other smaller more relevant benchmarks.

CodSpeedHQ / action

Pytest extension hangs forever after initialization #113