actions / runner-images

GitHub Actions runner images
MIT License
9.17k stars 2.84k forks source link

Ubuntu-22.04 image build failing at Report Generation #9780

Closed bvaughan527 closed 1 week ago

bvaughan527 commented 2 weeks ago

Description

The current Ubuntu-22.04 runner image is failing when executing the Generate-SoftwareReport.ps1 script with the below error:

==> azure-arm.build_image: /imagegeneration/SoftwareReport/Generate-SoftwareReport.ps1 : Cannot bind argument to parameter 'Path' because it is null.
==> azure-arm.build_image: + CategoryInfo          : InvalidData: (:) [Generate-SoftwareReport.ps1], ParameterBindingValidationException
==> azure-arm.build_image: + FullyQualifiedErrorId : ParameterArgumentValidationErrorNullNotAllowed,Generate-SoftwareReport.ps1

Platforms affected

Runner images affected

Image version and build link

Image Version: 20240422.1

Is it regression?

Yes: 20240324.2

Expected behavior

The Generate-SoftwareReport.ps1 PowerShell script should complete successfully.

Actual behavior

The Generate-SoftwareReport.ps1 is throwing the below error:

==> azure-arm.build_image: /imagegeneration/SoftwareReport/Generate-SoftwareReport.ps1 : Cannot bind argument to parameter 'Path' because it is null.
==> azure-arm.build_image: + CategoryInfo          : InvalidData: (:) [Generate-SoftwareReport.ps1], ParameterBindingValidationException
==> azure-arm.build_image: + FullyQualifiedErrorId : ParameterArgumentValidationErrorNullNotAllowed,Generate-SoftwareReport.ps1

Repro steps

Repro steps:

  1. Follow the instructions at the below link to build a runner agent image Link: https://github.com/actions/runner-images/blob/ubuntu22/20240324.2/docs/create-image-and-azure-resources.md
egg-r commented 2 weeks ago

FYI, if it helps with this issue, I tried removing the steps to run the report generation from my local setup and now see repeating failures within the Full Pester tests that run just after the report generator steps. On netlify, nuget, and podman networking they're failing with file permission errors.

This is new unexpected behavior as I've been working locally on some packer automation using these templates in a local feature branch for several days now. This was working up until this morning for me.

` Containers : {[+] /imagegeneration/tests/ActionArchiveCache.Tests.ps1, [+] /imagegeneration/tests/Android.Tests.ps1, [+] /imagegeneration/tests/Apt.Tests.ps1, [+] /imagegeneration/tests/Browsers.Tests.ps1…} Result : Failed FailedCount : 3 FailedBlocksCount : 0 FailedContainersCount : 0 PassedCount : 471 SkippedCount : 7 NotRunCount : 0 TotalCount : 481 Duration : 00:01:16.2149350 Executed : True ExecutedAt : 5/2/2024 5:50:08 PM Version : 5.5.0 PSVersion : 7.4.1 PSBoundParameters : {[Configuration, PesterConfiguration]} Plugins : PluginConfiguration : PluginData : Configuration : PesterConfiguration DiscoveryDuration : 00:00:01.2154888 UserDuration : 00:01:12.5733010 FrameworkDuration : 00:00:02.4261452 Failed : {[-] netlify, [-] nuget, [-] podman networking} FailedBlocks : {} FailedContainers : {} Passed : {[+] /opt/actionarchivecache not empty, [+] /opt/action archivecache/actions_cache/0865c47f36e68161719c5b124609 996bb5c40129.tar.gz, [+] /opt/actionarchivecache/action s_cache/0c45773b623bea8c8e75f6c82b208c3cf94ea4f9.tar.gz , [+] /opt/actionarchivecache/actions_cache/136d96b4aee 02b1f0de3ba493b1d47135042d9c0.tar.gz…} Skipped : {[!] , [!] , [!] erlang

, [!] erlang …} NotRun : {} Tests : {[+] /opt/actionarchivecache not empty, [+] /opt/action archivecache/actions_cache/0865c47f36e68161719c5b124609 996bb5c40129.tar.gz, [+] /opt/actionarchivecache/action s_cache/0c45773b623bea8c8e75f6c82b208c3cf94ea4f9.tar.gz , [+] /opt/actionarchivecache/actions_cache/136d96b4aee 02b1f0de3ba493b1d47135042d9c0.tar.gz…} CodeCoverage : ==> Exception: Test run has failed ` ` Command 'netlify --version' has finished with exit code › Error: EACCES: permission denied, open '$HOME/.config/netlify/config.json' You don't have access to this file. ┌───────────────────────────────────────────────────┐ │ netlify-cli update check failed │ │ Try running with sudo or get access │ │ to the local update config store via │ │ sudo chown -R $USER:$(id -gn $USER) $HOME/.config │ └───────────────────────────────────────────────────┘ at "$NodeCommand --version" | Should -ReturnZeroExitCode, /imagegeneration/tests/Node.Tests.ps1:7 at , /imagegeneration/tests/Node.Tests.ps1:7 `
bramvdklinkenberg commented 2 weeks ago

I have the same error with building Ubuntu 20.04.

==> azure-arm.build_image: Provisioning with shell script: /tmp/packer-shell2778030592
    azure-arm.build_image: Running Generate-SoftwareReport.ps1 script
==> azure-arm.build_image: /imagegeneration/SoftwareReport/Generate-SoftwareReport.ps1 : Cannot bind argument to parameter 'Path' because it is null.
==> azure-arm.build_image: + CategoryInfo          : InvalidData: (:) [Generate-SoftwareReport.ps1], ParameterBindingValidationException
==> azure-arm.build_image: + FullyQualifiedErrorId : ParameterArgumentValidationErrorNullNotAllowed,Generate-SoftwareReport.ps1
==> azure-arm.build_image: Provisioner failed with "Script exited with non-zero exit status: 1. Allowed exit codes are: [0]", retrying with 3 trie(s) left
enescakir commented 2 weeks ago

It's interesting that this script, which hasn't been modified recently, suddenly started to fail. Despite my attempts to debug it, the error message doesn't give enough clues.

okoudje commented 2 weeks ago

I have the same error on Ubuntu 20.04.

azure-arm.build_image: Running Generate-SoftwareReport.ps1 script ==> azure-arm.build_image: /imagegeneration/SoftwareReport/Generate-SoftwareReport.ps1 : Cannot bind argument to parameter 'Path' because it is null. ==> azure-arm.build_image: + CategoryInfo : InvalidData: (:) [Generate-SoftwareReport.ps1], ParameterBindingValidationException ==> azure-arm.build_image: + FullyQualifiedErrorId : ParameterArgumentValidationErrorNullNotAllowed,Generate-SoftwareReport.ps1 ==> azure-arm.build_image: Provisioner failed with "Script exited with non-zero exit status: 1. Allowed exit codes are: [0]", retrying with 3 trie(s) left

Was anyone able to fix it?

Sheldoras commented 2 weeks ago

I had the same problem and looked into it for a bit. I got it running again with some tweaks but currently don't fully grasp just how it broke at this point in time. So this are just my observations and assumptions to what may cause this.

The point where it actually breaks is here where which apt-fast returns nothing thus causes the error with the missing or null parameter on the Get-Content call.

Get-AptFastVersion is called here thus the error happening on the Generate-SoftwareReport.ps1 script execution.

So the problem seems to be that apt-fast is not available. There are two points of interest to this. First is here where apt-fast is intially installed. The integration of support for Ubuntu24 added a conditional check here utilizing the helper method is_ubuntu24. However it looks to me that the helper script file actually containing this function is not sourced. The source $HELPER_SCRIPTS/os.sh which can be found in all other scripts using this helper function seems to be missing here (maybe an oversight?).

Reconciling that however didn't fix it for me right away. The problem was still the same and here is where I don't quite follow along with the 'how did it work before'. Prior to the execution of the Generate-SoftwareReport.ps1 script, this cleanup script is executed and it seems to remove apt-fast.

So I got it running again by fixing that loop to not include (and thus not remove) apt-fast. Generate-SoftwareReport.ps1 executed without error then and the image was build.

I don't know nearly enough about this to say if this could be a possible fix or is actually just a very bad workaround because the intention here is something completely different. But maybe this helps finding an actual solution.

enescakir commented 2 weeks ago

Hi @mikhailkoliada, @erik-bershel, @Alexey-Ayupov. It seems that the image build is failing on the main branch due to this error. Are you experiencing the same issue?

sssharif commented 2 weeks ago

thanks for your report; we are looking into the issue.

bryan-bar commented 2 weeks ago

It's interesting that this script, which hasn't been modified recently, suddenly started to fail. Despite my attempts to debug it, the error message doesn't give enough clues.

I have the same issue when re-running a previously succeeding workflow. image


When using powershell 7.4's Set-PSDebug -Trace 2 to try to debug this error and other unrelated errors within Generate-SoftwareReport.ps1, it causes a new error to surface. Moving Set-PSDebug further down the script creates the same error on a new line and removing it allows the original error to surface or allows it to succeed if there are no errors:

    githubrunner.amazon-ebs.githubrunner: Error encountered
    githubrunner.amazon-ebs.githubrunner: DEBUG:  252+      >>>> Write-Host $_.ScriptStackTrace
    githubrunner.amazon-ebs.githubrunner: 
    githubrunner.amazon-ebs.githubrunner: at HeaderNode, /imagegeneration/SoftwareReport/software-report-base/SoftwareReport.Nodes.psm1: line 32
    githubrunner.amazon-ebs.githubrunner: at SoftwareReport, /imagegeneration/SoftwareReport/software-report-base/SoftwareReport.psm1: line 9
    githubrunner.amazon-ebs.githubrunner: at <ScriptBlock>, /imagegeneration/SoftwareReport/Generate-SoftwareReport.ps1: line 32
    githubrunner.amazon-ebs.githubrunner: DEBUG:  253+      >>>> Write-Host $_.Exception.Message
    githubrunner.amazon-ebs.githubrunner: 
    githubrunner.amazon-ebs.githubrunner: Index was out of range. Must be non-negative and less than or equal to the size of the collection. (Parameter 'index')
    githubrunner.amazon-ebs.githubrunner: DEBUG:  254+      >>>> Write-Host $_.Exception.GetType()
    githubrunner.amazon-ebs.githubrunner: 
    githubrunner.amazon-ebs.githubrunner: System.ArgumentOutOfRangeException
    githubrunner.amazon-ebs.githubrunner: DEBUG:  255+      >>>> exit 1

@Sheldoras suggested work-around worked, in my case I disabled apt-fast within Generate-SoftwareReport.ps1 and the run succeeds again. https://github.com/actions/runner-images/issues/9780#issuecomment-2092664160

The other location that apt-fast is configured is within scripts/build/configure-apt-mock.sh.


apt-fast's quick-install.sh install location changed from /usr/local/sbin to /usr/local/bin, https://github.com/ilikenwf/apt-fast/commit/c2cd0a0420d3f2d647dc82cf749bfd58c4697dac, which is removed, and the endpoint points to master. Reference @Sheldoras comment above: https://github.com/actions/runner-images/issues/9780#issuecomment-2092664160

lumarel commented 1 week ago

To my understanding, this is the root cause:

apt-fast's quick-install.sh install location changed from /usr/local/sbin to /usr/local/bin, https://github.com/ilikenwf/apt-fast/commit/c2cd0a0420d3f2d647dc82cf749bfd58c4697dac, which is removed, and the endpoint points to master. Reference @Sheldoras comment above: https://github.com/actions/runner-images/issues/9780#issuecomment-2092664160

After the quick-install.sh, configure-apt-mock.sh overwrites apt-fast with the wrapper, that prevents apt being locked. (as well as making it more verbose)

As it's already said by Sheldoras, I too think the Cannot bind argument to parameter 'Path' because it is null. failure comes from https://github.com/actions/runner-images/blob/7bb1d84f7071bfa9c350d7552d9631d9a69bfdb0/images/ubuntu/scripts/docs-gen/SoftwareReport.Tools.psm1#L7 where which apt-fast reports empty string instead of the path to the script.

so the workaround fix should be moving apt-fast back to sbin right after installation?

Update: tested and works. (adding mv /usr/local/bin/apt-fast /usr/local/sbin/apt-fast right after here)

erik-bershel commented 1 week ago

Heads up!

Should be fixed by now here: https://github.com/actions/runner-images/pull/9794 Please update your main and try build again.

bvaughan527 commented 1 week ago

Thanks @erik-bershel, I can confirm this has been fixed.