actions / runner-images

GitHub Actions runner images
MIT License
9.9k stars 3.01k forks source link

New image for ubuntu-latest docker issues #10646

Open BBTristanBenschop opened 11 hours ago

BBTristanBenschop commented 11 hours ago

Description

When starting a docker container in our devops pipeline since last night we receive the error:

Docker.DotNet.DockerApiException : Docker API responded with status code=Conflict, response={"message":"container f111f68d35dc63185d89a5d93600a4af7fdb01513d9e17f873aa400c3dc0c6da is not running"}

Platforms affected

Runner images affected

Image version and build link

We are using ubuntu-latest

Agent name: 'Hosted Agent' Agent machine name: 'fv-az366-67' Current agent version: '3.243.1' Operating System Runner Image Runner Image Provisioner Current image version: '20240915.1.0' Agent running as: 'vsts'

the last working version is:

Agent name: 'Azure Pipelines 2' Agent machine name: 'fv-az634-412' Current agent version: '3.243.1' Operating System Runner Image Runner Image Provisioner Current image version: '20240908.1.0' Agent running as: 'vsts'

Is it regression?

Current image version: '20240908.1.0'

Expected behavior

To properly startup the docker container so that our tests can be ran inside the container.

Actual behavior

Fails to start docker container

Repro steps

Start a docker container

SpaceOgre commented 11 hours ago

We are seeing this problem as well, tested in ubuntu 24.04 as well and same problem there.

josefinbrandt commented 11 hours ago

We also have this problem.

kishorekumar-anchala commented 11 hours ago

Hi @BBTristanBenschop ,

Thank you for bringing this issue to us. We are looking into this issue and will update you on this issue after investigating.

BBTristanBenschop commented 11 hours ago

Temporarily fix for us is switching to ubuntu-20.04 image where this issue doesn't exist.

kishorekumar-anchala commented 10 hours ago

Hi @SpaceOgre @BBTristanBenschop , Kindly share below things

  1. The specific DevOps pipeline configuration related to starting the Docker container.
  2. The exact command or script used to start the Docker container.
SpaceOgre commented 10 hours ago

@kishorekumar-anchala

YAML file

# ASP.NET Core
# Build and test ASP.NET Core projects targeting .NET Core.
# Add steps that run tests, create a NuGet package, deploy, and more:
# https://docs.microsoft.com/azure/devops/pipelines/languages/dotnet-core

trigger:
  - main

pool:
  vmImage: ubuntu-latest

variables:
  buildConfiguration: "Release"

steps:
  - task: UseDotNet@2
    displayName: "Use .NET 8 sdk"
    inputs:
      packageType: "sdk"
      version: "8.0.x"

  # This is needed for the dotnet tool install commands to work, the dotnet restore command should work without it but I keep it at the top just in case.
  - task: NuGetAuthenticate@1
    displayName: "Authenticate with NuGet"

  - task: DotNetCoreCLI@2
    displayName: dotnet restore
    inputs:
      command: restore
      projects: '**/*.csproj'
      feedRestore: GR.Library

  - task: DotNetCoreCLI@2
    inputs:
      command: custom
      custom: format
      arguments: "--verify-no-changes --verbosity diagnostic"
    displayName: Check formatting

  - task: DotNetCoreCLI@2
    displayName: "dotnet build $(buildConfiguration)"
    inputs:
      command: build
      projects: "**/*.csproj"
      arguments: "--configuration $(buildConfiguration)"

  - task: DotNetCoreCLI@2
    displayName: Dotnet test
    inputs:
      command: "test"
      projects: "tests/**/*.csproj"
      publishTestResults: true
      arguments: '--configuration $(buildConfiguration) --collect:"Code Coverage" --settings:devops/CodeCoverage.runsettings'

  - task: DotNetCoreCLI@2
    displayName: "Install dotnet-coverage"
    inputs:
      command: custom
      custom: tool
      arguments: "install --global dotnet-coverage"

  - task: DotNetCoreCLI@2
    displayName: "Install ReportGenerator"
    inputs:
      command: custom
      custom: tool
      arguments: "install --global dotnet-reportgenerator-globaltool"

  # This step is needed for the reportgenerator to work, since we use the Code Coverage collect during tests and the reportgenerator does not support it.
  # It is done like this so we can get code coverage results in Pull Request and get a full report to download and look at if needed.
  - script: dotnet-coverage merge -r -f cobertura -o merged.cobertura.xml $(Agent.WorkFolder)/*.coverage
    displayName: Merge code coverage files

  - script: reportgenerator -reports:merged.cobertura.xml -targetdir:$(Build.SourcesDirectory)/CodeCoverage -reporttypes:'HtmlInline' -classfilters:+GR.PRIIS.*
    displayName: Create Html Report for Code Coverage

  - task: PublishBuildArtifacts@1
    displayName: "Publish code coverage html report as artifact"
    inputs:
      PathtoPublish: "$(Build.SourcesDirectory)/CodeCoverage"
      ArtifactName: "CodeCoverage"
      publishLocation: "Container"

Docker part

We are starting docker in the Dotnet test task using TestContainers DotNet: https://github.com/testcontainers/testcontainers-dotnet

robinbaxon commented 10 hours ago

Can confirm that we experience a similar issue on our side as well, related to our TestContainers usage. We are running our workflows in GitHub on GitHub repositories, not pipelines in AzureDevOps.

We reverted our workflows to use ubuntu-20.04 (which gave us the ubuntu-20.04.6 revision of the image) and that works for us for most of our workflows. Thanks for the tip @BBTristanBenschop.

BBTristanBenschop commented 9 hours ago

@kishorekumar-anchala we use a very similar setup as SpaceOgre including the use of TestContainers. It fails on running the tests which makes use of TestContainer library.

SpaceOgre commented 9 hours ago

@kishorekumar-anchala Adding some more context from stdout:

Failing in 24.04

[xUnit.net 00:00:00.45]   Starting:    GR.PRIIS.API.IntegrationTests
[testcontainers.org 00:00:00.13] Connected to Docker:
  Host: unix:///var/run/docker.sock
  Server Version: 26.1.3
  Kernel Version: 6.8.0-1014-azure
  API Version: 1.45
  Operating System: Ubuntu 24.04.1 LTS
  Total Memory: 6.77 GB
[testcontainers.org 00:00:00.23] Searching Docker registry credential in Auths
[testcontainers.org 00:00:00.23] Docker registry credential https://index.docker.io/v1/ found
[testcontainers.org 00:00:00.87] Searching Docker registry credential in CredHelpers
[testcontainers.org 00:00:00.87] Searching Docker registry credential in CredsStore
[testcontainers.org 00:00:03.20] Docker image testcontainers/ryuk:0.6.0 created
[testcontainers.org 00:00:03.30] Docker container 0935fb8ac4c0 created
[testcontainers.org 00:00:03.37] Start Docker container 0935fb8ac4c0
[testcontainers.org 00:00:04.13] Wait for Docker container 0935fb8ac4c0 to complete readiness checks
[testcontainers.org 00:00:04.14] Docker container 0935fb8ac4c0 ready
[testcontainers.org 00:00:04.15] Searching Docker registry credential in Auths
[testcontainers.org 00:00:04.15] Searching Docker registry credential in CredHelpers
[testcontainers.org 00:00:04.15] Searching Docker registry credential in Auths
[testcontainers.org 00:00:04.15] Searching Docker registry credential in CredsStore
[testcontainers.org 00:00:04.15] Docker registry credential mcr.microsoft.com not found
[testcontainers.org 00:00:24.72] Docker image mcr.microsoft.com/mssql/server:2019-CU18-ubuntu-20.04 created
[testcontainers.org 00:00:24.74] Docker container 0661ae8d376a created
[testcontainers.org 00:00:24.75] Start Docker container 0661ae8d376a
[testcontainers.org 00:00:25.01] Wait for Docker container 0661ae8d376a to complete readiness checks
[testcontainers.org 00:00:25.02] Execute "/bin/sh -c find /opt/mssql-tools*/bin/sqlcmd -type f -print -quit" at Docker container 0661ae8d376a
[testcontainers.org 00:00:25.17] Execute "/opt/mssql-tools/bin/sqlcmd -C -Q SELECT 1;" at Docker container 0661ae8d376a
[testcontainers.org 00:00:31.45] Execute "/opt/mssql-tools/bin/sqlcmd -C -Q SELECT 1;" at Docker container 0661ae8d376a
[xUnit.net 00:00:32.16]       Docker.DotNet.DockerApiException : Docker API responded with status code=Conflict, response={"message":"container 0661ae8d376a18e1b335b257c18323bc58a990b1507a2bac423dc778385c179e is not running"}

How it looks when it works in 20.04

[testcontainers.org 00:00:00.09] Connected to Docker:
  Host: unix:///var/run/docker.sock
  Server Version: 26.1.3
  Kernel Version: 5.15.0-1071-azure
  API Version: 1.45
  Operating System: Ubuntu 20.04.6 LTS
  Total Memory: 6.77 GB
[testcontainers.org 00:00:00.15] Searching Docker registry credential in CredHelpers
[testcontainers.org 00:00:00.15] Searching Docker registry credential in Auths
[testcontainers.org 00:00:00.16] Searching Docker registry credential in CredsStore
[testcontainers.org 00:00:00.16] Docker registry credential https://index.docker.io/v1/ found
[testcontainers.org 00:00:02.50] Docker image testcontainers/ryuk:0.6.0 created
[testcontainers.org 00:00:02.60] Docker container 0c5851659281 created
[testcontainers.org 00:00:02.66] Start Docker container 0c5851659281
[testcontainers.org 00:00:03.05] Wait for Docker container 0c5851659281 to complete readiness checks
[testcontainers.org 00:00:03.05] Docker container 0c5851659281 ready
[testcontainers.org 00:00:03.07] Searching Docker registry credential in Auths
[testcontainers.org 00:00:03.07] Searching Docker registry credential in CredHelpers
[testcontainers.org 00:00:03.07] Searching Docker registry credential in CredsStore
[testcontainers.org 00:00:03.07] Searching Docker registry credential in Auths
[testcontainers.org 00:00:03.07] Docker registry credential mcr.microsoft.com not found
[testcontainers.org 00:00:21.09] Docker image mcr.microsoft.com/mssql/server:2019-CU18-ubuntu-20.04 created
[testcontainers.org 00:00:21.14] Docker container 551804a10cff created
[testcontainers.org 00:00:21.14] Start Docker container 551804a10cff
[testcontainers.org 00:00:21.40] Wait for Docker container 551804a10cff to complete readiness checks
[testcontainers.org 00:00:21.41] Execute "/bin/sh -c find /opt/mssql-tools*/bin/sqlcmd -type f -print -quit" at Docker container 551804a10cff
[testcontainers.org 00:00:21.51] Execute "/opt/mssql-tools/bin/sqlcmd -C -Q SELECT 1;" at Docker container 551804a10cff
[testcontainers.org 00:00:27.12] Execute "/opt/mssql-tools/bin/sqlcmd -C -Q SELECT 1;" at Docker container 551804a10cff
[testcontainers.org 00:00:27.21] Docker container 551804a10cff ready
kwuite commented 9 hours ago

Temporarily fix for us is switching to ubuntu-20.04 image where this issue doesn't exist.

Our mssql container failed in CI/CD with really odd messages like:

This program has encountered a fatal error and cannot continue running at Thu Sep 19 08:41:27 2024
The following diagnostic information is available:

         Reason: 0x00000001
         Signal: SIGABRT - Aborted (6)
          Stack:
                 IP               Function
                 ---------------- --------------------------------------
                 000056207ec88a5a std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > std::__1::operator+<char, std::__1::char_traits<char>, std::__1::allocator<char> >(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > c
                 000056207ec88559 std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > std::__1::operator+<char, std::__1::char_traits<char>, std::__1::allocator<char> >(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > c
                 000056207ec8745c std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > std::__1::operator+<char, std::__1::char_traits<char>, std::__1::allocator<char> >(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > c
                 00007f0982d7b4b0 killpg+0x40
                 00007f0982d7b428 gsignal+0x38
                 00007f0982d7d02a abort+0x16a
                 000056207ec1add4 std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > std::__1::operator+<char, std::__1::char_traits<char>, std::__1::allocator<char> >(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > c
                 000056207ecdf7a8 void google::protobuf::internal::arena_delete_object<google::protobuf::Message>(void*)+0x2a38
                 000056207ecdf500 void google::protobuf::internal::arena_delete_object<google::protobuf::Message>(void*)+0x2790
                 000056207ec2ab36 std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > std::__1::operator+<char, std::__1::char_traits<char>, std::__1::allocator<char> >(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > c
                 000056207ec2a7d0 std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > std::__1::operator+<char, std::__1::char_traits<char>, std::__1::allocator<char> >(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > c
        Process: 9 - sqlservr
         Thread: 73 (application thread 0xf4)
    Instance Id: a672206a-cf56-49cc-9c29-89a1990f3f36
       Crash Id: a1fba965-828c-4d91-bf1c-8cb2a0709c68
    Build stamp: 52ec5c991c015cbdd504002245460be8a9a9b6c41343aaab03cf768750a6c2df
   Distribution: Ubuntu 16.04.6 LTS
     Processors: 2
   Total Memory: 8324341760 bytes
      Timestamp: Thu Sep 19 08:41:27 2024
     Last errno: 2
Last errno text: No such file or directory

We noticed, the runner images have been update by Github.

This was our Github Action Host machine image from a few days ago

Runner Image
Image: ubuntu-22.04
  Version: 20240908.1.0
  Included Software: https://github.com/actions/runner-images/blob/ubuntu22/20240908.1/images/ubuntu/Ubuntu2204-Readme.md
  Image Release: https://github.com/actions/runner-images/releases/tag/ubuntu22%2F20240908.1

This is the version we see today, that fail.

Runner Image
  Image: ubuntu-22.04
  Version: 20240915.1.0
  Included Software: https://github.com/actions/runner-images/blob/ubuntu22/20240915.1/images/ubuntu/Ubuntu2204-Readme.md
  Image Release: https://github.com/actions/runner-images/releases/tag/ubuntu22%2F20240915.1

I believe the Linux kernel change from 6.5 to 6.8 is the reason this is failing.

:tada:We fixed our issue by using the Ubuntu 20.04 runner image because both 22 and 24 have been affected as you can read in this issue on Github: https://github.com/actions/runner-images/issues/10646

to fix docker issues in your workflow.yml, make the following change:

runs-on: ubuntu-20.04
kiview commented 4 hours ago

Related issue: https://github.com/actions/runner-images/issues/10649