dotnet / performance

This repo contains benchmarks used for testing the performance of all .NET Runtimes
MIT License
684 stars 263 forks source link

Update LTTNG installation to workaround bug #4149

Open LoopedBard3 opened 4 months ago

LoopedBard3 commented 4 months ago

The following bug caused us to remove the lttng-modules-dkms package for every Linux microbenchmark run: https://bugs.launchpad.net/ubuntu/+source/lttng-modules/+bug/2043004, thanks @caaavik-msft. This issue tracks updating how we install the lttng-modules to workaround this problem longterm. It seems that manually installing latest lttng rather than from the Ubuntu package repository may fix the issue.

Here is a sample of the original error:

+ sudo apt-get -y install python3-pip
Reading package lists...
Building dependency tree...
Reading state information...
python3-pip is already the newest version (22.0.2+dfsg-1ubuntu0.4).
The following packages were automatically installed and are no longer required:
  apport-symptoms python3-systemd
Use 'sudo apt autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 4 not upgraded.
1 not fully installed or removed.
After this operation, 0 B of additional disk space will be used.
Setting up lttng-modules-dkms (2.13.8-1~ubuntu22.04.0) ...
debconf: unable to initialize frontend: Dialog
debconf: (Dialog frontend will not work on a dumb terminal, an emacs shell buffer, or without a controlling terminal.)
debconf: falling back to frontend: Readline
Removing old lttng-modules-2.13.8 DKMS files...
Deleting module lttng-modules-2.13.8 completely from the DKMS tree.
Loading new lttng-modules-2.13.8 DKMS files...
Building for 6.5.0-27-generic
Building initial module for 6.5.0-27-generic
Error! Bad return status for module build on kernel: 6.5.0-27-generic (x86_64)
Consult /var/lib/dkms/lttng-modules/2.13.8/build/make.log for more information.
dpkg: error processing package lttng-modules-dkms (--configure):
 installed lttng-modules-dkms package post-installation script subprocess returned error exit status 10
Errors were encountered while processing:
 lttng-modules-dkms
E: Sub-process /usr/bin/dpkg returned an error code (1)
+ export PERF_PREREQS_INSTALL_FAILED=1

Looking at job 9e113869-98a8-4c88-893c-9433b5a33072, the only two machines that seem to be affected are PERFTIGER138 and 139.

Digging into the Azure Data Explorer, the failure rate of those two machines is >97% while all others are much lower. I messaged to get these machines taken offline for investigation. Looking at the daily failure rates, it seems that the failure started happening consistently on the 12. We will want to rerun the runs between the 12 and 15th.

LoopedBard3 commented 4 months ago

Machines have been removed: https://github.com/dotnet/dnceng/issues/2596

LoopedBard3 commented 4 months ago

Seems other machines may now be hitting the same issue.

LoopedBard3 commented 4 months ago

Updating this issue to track updating how we install the lttng version to get around the bug.