Open mih opened 9 months ago
Since the action runs the provided run
command under valgrind
and some other stuff, it might be because under this new environment git-annex is not available.
You can enable the step debug logging to have more ingo about the command that was run under the action, which will help investigate.
Thanks for the pointer. This now makes clear what is being executed. I am posting the example of my usecase below (slightly formatted for better readability).
ARCH="x86_64"
CODSPEED_ENV="runner"
PATH="/tmp/codspeed_introspected_node:
/opt/hostedtoolcache/Python/3.12.2/x64/bin:
/opt/hostedtoolcache/Python/3.12.2/x64:
/tmp/dl-build-4f3gnv48/git-annex.linux:
/opt/hostedtoolcache/Python/3.12.2/x64/bin:
/opt/hostedtoolcache/Python/3.12.2/x64:
/snap/bin:/home/runner/.local/bin:
/opt/pipx_bin:
/home/runner/.cargo/bin:
/home/runner/.config/composer/vendor/bin:
/usr/local/.ghcup/bin:
/home/runner/.dotnet/tools:
/usr/local/sbin:
/usr/local/bin:
/usr/sbin:
/usr/bin:
/sbin:
/bin:
/usr/games:
/usr/local/games:
/snap/bin"
PYTHONHASHSEED="0"
PYTHONMALLOC="malloc"
"setarch" "x86_64" "-R"
"valgrind" "-q"
"--tool=callgrind"
"--trace-children=yes"
"--cache-sim=yes"
"--I1=32768,8,64" "--D1=32768,8,64" "--LL=8388608,16,64"
"--instr-atstart=no"
"--collect-systime=nsec"
"--compress-strings=no"
"--combine-dumps=yes"
"--dump-line=no"
"--trace-children-skip=*esbuild"
"--obj-skip=/opt/hostedtoolcache/Python/3.12.2/x64/lib/libpython3.12.so.1.0"
"--obj-skip=/usr/local/bin/node"
"--callgrind-out-file
This indicates that the PATH is properly propagated.
I tried reproducing the behavior locally with this valgrind call pattern. On the test system (Debian sid), git-annex is installed as an official system package, with no PATH manipulation necessary:
❯ apt-cache policy git-annex
git-annex:
Installed: 10.20230802-1
Candidate: 10.20230802-1
Version table:
*** 10.20230802-1 500
500 http://deb.debian.org/debian sid/main amd64 Packages
❯ ldd /usr/bin/git-annex
linux-vdso.so.1 (0x00007ffd309bf000)
libyaml-0.so.2 => /lib/x86_64-linux-gnu/libyaml-0.so.2 (0x00007f35c97c5000)
libsqlite3.so.0 => /lib/x86_64-linux-gnu/libsqlite3.so.0 (0x00007f35c9655000)
libmagic.so.1 => /lib/x86_64-linux-gnu/libmagic.so.1 (0x00007f35c9629000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f35c960a000)
libgmp.so.10 => /lib/x86_64-linux-gnu/libgmp.so.10 (0x00007f35c9586000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f35c93a2000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f35c92c3000)
libffi.so.8 => /lib/x86_64-linux-gnu/libffi.so.8 (0x00007f35c92b6000)
liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f35c9286000)
libbz2.so.1.0 => /lib/x86_64-linux-gnu/libbz2.so.1.0 (0x00007f35c9273000)
/lib64/ld-linux-x86-64.so.2 (0x00007f35c9808000)
The benchmarks run fine with --trace-children=no
. Here is the full command:
valgrind -q --tool=callgrind --trace-children=no --cache-sim=yes "--I1=32768,8,64" "--D1=32768,8,64" "--LL=8388608,16,64" --instr-atstart=no --collect-systime=nsec --compress-strings=no --combine-dumps=yes --dump-line=no "--trace-children-skip=*esbuild" python -m pytest --codspeed datalad_next
...
=============================== 2 passed, 391 deselected, 12 warnings in 15.64s ===============================
valgrind -q --tool=callgrind --trace-children=no --cache-sim=yes -m 10,21s user 9,18s system 100% cpu 19,367 total
Switching to --trace-children=yes
(while leaving everything else constant) causes the tests to fail.
valgrind -q --tool=callgrind --trace-children=yes --cache-sim=yes "--I1=32768,8,64" "--D1=32768,8,64" "--LL=8388608,16,64" --instr-atstart=no --collect-systime=nsec --compress-strings=no --combine-dumps=yes --dump-line=no "--trace-children-skip=*esbuild" python -m pytest --codspeed datalad_next
...
=========================================== short test summary info ===========================================
ERROR datalad_next/iter_collections/tests/test_itergitstatus.py::test_status_smrec - datalad.support.exceptions.IncompleteResultsError: Command did not complete successfully. 36 failed:
ERROR datalad_next/iter_collections/tests/test_itergitstatus.py::test_status_monorec - datalad.support.exceptions.IncompleteResultsError: Command did not complete successfully. 36 failed:
========================= 391 deselected, 12 warnings, 2 errors in 364.92s (0:06:04) ==========================
valgrind -q --tool=callgrind --trace-children=yes --cache-sim=yes - 362,93s user 32,89s system 105% cpu 6:14,51 total
Importantly, the failure pattern is different locally from what is happening in the github action CI run. It looks like some kind of race condition.
These particular benchmarks call out to various Git and git-annex command line tools. It seems that the valgrind wrapping of such subprocesses causes significant changes in their behavior.
That being said: from a benchmarking perspective, I am not interested in what these external tools do exactly. I am only interested in the performance of the Python code that calls out to them. Would it be sensible to turn off the subprocess tracing? And if so, is this somehow possible from the outside?
In https://github.com/datalad/datalad-next/pull/644 I have added benchmarks and a codspeed Github action. Benchmarks run fine locally and in the action via
python -m pytest --codspeed datalad_next
(see "Debug" step).However, when executed within CodSpeedHQ/action@v2 the execution fails, because a required command is not found. This is independent of whether that command is installed via the method in the PR, or as an Ubuntu system package.
Can you advice on what to do in this case? Are there additional requirements to be met for compatibility with the codspeed runner?
Thanks in advance!