Documentation/advice on differences of direct vs codspeed runner test execution

mih commented 9 months ago

In https://github.com/datalad/datalad-next/pull/644 I have added benchmarks and a codspeed Github action. Benchmarks run fine locally and in the action via python -m pytest --codspeed datalad_next (see "Debug" step).

However, when executed within CodSpeedHQ/action@v2 the execution fails, because a required command is not found. This is independent of whether that command is installed via the method in the PR, or as an Ubuntu system package.

Can you advice on what to do in this case? Are there additional requirements to be met for compatibility with the codspeed runner?

Thanks in advance!

adriencaccia commented 9 months ago

Since the action runs the provided run command under valgrind and some other stuff, it might be because under this new environment git-annex is not available. You can enable the step debug logging to have more ingo about the command that was run under the action, which will help investigate.

mih commented 9 months ago

Thanks for the pointer. This now makes clear what is being executed. I am posting the example of my usecase below (slightly formatted for better readability).

ARCH="x86_64"
CODSPEED_ENV="runner"
PATH="/tmp/codspeed_introspected_node:
  /opt/hostedtoolcache/Python/3.12.2/x64/bin:
  /opt/hostedtoolcache/Python/3.12.2/x64:
  /tmp/dl-build-4f3gnv48/git-annex.linux:
  /opt/hostedtoolcache/Python/3.12.2/x64/bin:
  /opt/hostedtoolcache/Python/3.12.2/x64:
  /snap/bin:/home/runner/.local/bin:
  /opt/pipx_bin:
  /home/runner/.cargo/bin:
  /home/runner/.config/composer/vendor/bin:
  /usr/local/.ghcup/bin:
  /home/runner/.dotnet/tools:
  /usr/local/sbin:
  /usr/local/bin:
  /usr/sbin:
  /usr/bin:
  /sbin:
  /bin:
  /usr/games:
  /usr/local/games:
  /snap/bin"
PYTHONHASHSEED="0"
PYTHONMALLOC="malloc"
"setarch" "x86_64" "-R"
  "valgrind" "-q"
    "--tool=callgrind"
    "--trace-children=yes"
    "--cache-sim=yes"
    "--I1=32768,8,64" "--D1=32768,8,64" "--LL=8388608,16,64"
    "--instr-atstart=no"
    "--collect-systime=nsec"
    "--compress-strings=no"
    "--combine-dumps=yes"
    "--dump-line=no"
    "--trace-children-skip=*esbuild"
    "--obj-skip=/opt/hostedtoolcache/Python/3.12.2/x64/lib/libpython3.12.so.1.0"
    "--obj-skip=/usr/local/bin/node"
    "--callgrind-out-file

This indicates that the PATH is properly propagated.

I tried reproducing the behavior locally with this valgrind call pattern. On the test system (Debian sid), git-annex is installed as an official system package, with no PATH manipulation necessary:

❯ apt-cache policy git-annex
git-annex:
  Installed: 10.20230802-1
  Candidate: 10.20230802-1
  Version table:
 *** 10.20230802-1 500
        500 http://deb.debian.org/debian sid/main amd64 Packages
❯ ldd /usr/bin/git-annex
        linux-vdso.so.1 (0x00007ffd309bf000)
        libyaml-0.so.2 => /lib/x86_64-linux-gnu/libyaml-0.so.2 (0x00007f35c97c5000)
        libsqlite3.so.0 => /lib/x86_64-linux-gnu/libsqlite3.so.0 (0x00007f35c9655000)
        libmagic.so.1 => /lib/x86_64-linux-gnu/libmagic.so.1 (0x00007f35c9629000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f35c960a000)
        libgmp.so.10 => /lib/x86_64-linux-gnu/libgmp.so.10 (0x00007f35c9586000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f35c93a2000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f35c92c3000)
        libffi.so.8 => /lib/x86_64-linux-gnu/libffi.so.8 (0x00007f35c92b6000)
        liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f35c9286000)
        libbz2.so.1.0 => /lib/x86_64-linux-gnu/libbz2.so.1.0 (0x00007f35c9273000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f35c9808000)

The benchmarks run fine with --trace-children=no. Here is the full command:

valgrind -q --tool=callgrind --trace-children=no --cache-sim=yes "--I1=32768,8,64" "--D1=32768,8,64" "--LL=8388608,16,64" --instr-atstart=no --collect-systime=nsec --compress-strings=no --combine-dumps=yes --dump-line=no "--trace-children-skip=*esbuild" python -m pytest --codspeed datalad_next
...
=============================== 2 passed, 391 deselected, 12 warnings in 15.64s ===============================
valgrind -q --tool=callgrind --trace-children=no --cache-sim=yes           -m  10,21s user 9,18s system 100% cpu 19,367 total

Switching to --trace-children=yes (while leaving everything else constant) causes the tests to fail.

valgrind -q --tool=callgrind --trace-children=yes --cache-sim=yes "--I1=32768,8,64" "--D1=32768,8,64" "--LL=8388608,16,64" --instr-atstart=no --collect-systime=nsec --compress-strings=no --combine-dumps=yes --dump-line=no "--trace-children-skip=*esbuild" python -m pytest --codspeed datalad_next
...
=========================================== short test summary info ===========================================
ERROR datalad_next/iter_collections/tests/test_itergitstatus.py::test_status_smrec - datalad.support.exceptions.IncompleteResultsError: Command did not complete successfully. 36 failed:
ERROR datalad_next/iter_collections/tests/test_itergitstatus.py::test_status_monorec - datalad.support.exceptions.IncompleteResultsError: Command did not complete successfully. 36 failed:
========================= 391 deselected, 12 warnings, 2 errors in 364.92s (0:06:04) ==========================
valgrind -q --tool=callgrind --trace-children=yes --cache-sim=yes           -  362,93s user 32,89s system 105% cpu 6:14,51 total

Importantly, the failure pattern is different locally from what is happening in the github action CI run. It looks like some kind of race condition.

These particular benchmarks call out to various Git and git-annex command line tools. It seems that the valgrind wrapping of such subprocesses causes significant changes in their behavior.

That being said: from a benchmarking perspective, I am not interested in what these external tools do exactly. I am only interested in the performance of the Python code that calls out to them. Would it be sensible to turn off the subprocess tracing? And if so, is this somehow possible from the outside?

CodSpeedHQ / action

Documentation/advice on differences of direct vs codspeed runner test execution #93