Open FlorentATo opened 2 months ago
Hi @FlorentATo, I've created a new branch azure/cli@fix-161
with additional debugging logs to help identify whether your issue is caused by a mislead filepath, a failed file creation, or limited permissions. Could you please execute the following step and share the logs with me?
- name: Azure CLI script to copy artifact
uses: azure/cli@fix-161
with:
azcliversion: latest
inlineScript: |
az storage blob upload --account-name REDACTED --container-name REDACTED --name target_file_name --file ${{inputs.file_path}}/${{inputs.file}} --auth-mode login
Sure thing, will do in the morning and keep you posted. Thank you
Logs:
Run azure/login@v1.6.1
Running Azure CLI Login.
/usr/bin/az cloud set -n azurecloud
Done setting cloud: "azurecloud"
Note: Azure/login action also supports OIDC login mechanism. Refer https://github.com/azure/login#configure-a-service-principal-with-a-federated-credential-to-use-oidc-based-authentication for more details.
Attempting Azure CLI login by using service principal with secret...
Subscription is set successfully.
Azure CLI login succeeds by using service principal with secret.
Run azure/cli@fix-161
Script file created at /data/runner/work/_temp/AZ_CLI_GITHUB_ACTION_1725038641263.sh
The file: '/data/runner/work/_temp/AZ_CLI_GITHUB_ACTION_1725038641263.sh' exists.
Access to the script file is available.
chmod +x /data/runner/work/_temp/AZ_CLI_GITHUB_ACTION_1725038641263.sh
Executable permissions given to the script file.
Starting script execution via docker image mcr.microsoft.com/azure-cli:latest
Error: Error: bash: /data/runner/work/_temp/AZ_CLI_GITHUB_ACTION_1725038641263.sh: No such file or directory
I'm looking into the interaction between the host and the job's container.
@FlorentATo, it appears that the volume mounting between the host and container failed on your runner. Could you please try using azure/cli@fix-161
again? I've moved the script file creation to occur after the mounting. But I'm also concerned that the login information is not mounted correctly either.
Logs:
(...)
Download action repository 'azure/login@v1.6.1' (SHA:cb79c773a3cfa27f31f25eb3f677781210c9ce3d)
(...)
Run azure/login@v1.6.1
Running Azure CLI Login.
/usr/bin/az cloud set -n azurecloud
Done setting cloud: "azurecloud"
Note: Azure/login action also supports OIDC login mechanism. Refer https://github.com/azure/login#configure-a-service-principal-with-a-federated-credential-to-use-oidc-based-authentication for more details.
Attempting Azure CLI login by using service principal with secret...
Subscription is set successfully.
Azure CLI login succeeds by using service principal with secret.
Run azure/cli@fix-161
Script file created at /data/runner/work/_temp/AZ_CLI_GITHUB_ACTION_1725295367243.sh
The file: '/data/runner/work/_temp/AZ_CLI_GITHUB_ACTION_1725295367243.sh' exists.
Access to the script file is available.
chmod +x /data/runner/work/_temp/AZ_CLI_GITHUB_ACTION_1725295367243.sh
Executable permissions given to the script file.
Starting script execution via docker image mcr.microsoft.com/azure-cli:latest
Error: Error: bash: /data/runner/work/_temp/AZ_CLI_GITHUB_ACTION_1725295367243.sh: No such file or directory
I don't believe there's an issue between the runner's container and the host, as I can see the ${RUNNER_WORKDIR} properly mounted into the container.
Here's the container's spec:
flodev@flo-svr-bld001:~$ docker inspect 42a53ace353d | jq '.[].Mounts[]'
{
"Type": "bind",
"Source": "/mnt/data/yocto",
"Destination": "/data/runner",
"Mode": "",
"RW": true,
"Propagation": "rprivate"
}
{
"Type": "bind",
"Source": "/var/run/docker.sock",
"Destination": "/var/run/docker.sock",
"Mode": "",
"RW": true,
"Propagation": "rprivate"
}
flodev@flo-svr-bld001:~$ docker inspect 42a53ace353d | jq '.[].HostConfig.Binds[]'
"/mnt/data/yocto:/data/runner"
"/var/run/docker.sock:/var/run/docker.sock"
and the resulting mount points in the runner's container:
runner@42a53ace353d:/actions-runner$ mount | grep data
/dev/mapper/ubuntu--vg-data on /data/runner type ext4 (rw,relatime)
runner@42a53ace353d:/actions-runner$ df | grep data
/dev/mapper/ubuntu--vg-data 1737141248 122702132 1526123420 8% /data/runner
From the host, I can see the script file being created properly:
flodev@flo-svr-bld001:~$ while sleep 0.5; do date; ls -lR /mnt/data/yocto/work/_temp; done
returns:
Mon Sep 2 04:42:47 PM UTC 2024
/mnt/data/yocto/work/_temp:
total 93428
drwxr-xr-x 6 flodev 121 4096 Jul 16 19:04 1c380dc8-eb75-4b3a-8f8f-2905ab1d68a1
-rw-r--r-- 1 flodev 121 115 Sep 2 16:42 ab1e7f15-d265-4a11-9812-408120b92210.sh
-rwxr-xr-x 1 flodev 121 105 Sep 2 16:42 AZ_CLI_GITHUB_ACTION_1725295367243.sh <= HERE
-rw-r--r-- 1 flodev 121 22 Sep 2 16:42 de489297-ce9e-4009-aaf8-64dc0a47d6df.sh
-rw-r--r-- 1 flodev 121 559 Sep 2 16:37 e9164171-3d9c-418b-b6ba-2edda957f30e.sh
-rw-r--r-- 1 flodev 121 95640714 Sep 2 16:42 f2ff789f-5046-4b90-a532-7c45df30373e
drwxr-xr-x 2 flodev 121 4096 Sep 2 16:37 _github_workflow
drwxr-xr-x 2 flodev 121 4096 Sep 2 16:42 _runner_file_commands
(...)
However the file is deleted immediately after creation (< 1second later), which makes believe the docker container resulting from executeDockerCommand
can't access the script file, fails, and gets removed then replaced by a fresh runner container.
Help me clarify something: does the azure/cli container run within the runner's container ? Or beside it ? Or replace it during its execution ? I believe it runs within but I'd like to have confirmation.
Clarification: the script calls the docker daemon running on the host via /var/run/docker.sock
which is exposed (mounted) into the runner's container (see above). So the ephemeral container that runs the inline bash script runs beside the runner's container.
I'll try to run it manually and see what happens...
So I was able to capture when the container that runs the script is created (docker ps --no-trunc
):
0810f471af786a0515825eed61eaf6faf14305d29c44eb6244421baf34acb38c mcr.microsoft.com/azure-cli:latest "bash --noprofile --norc -e /data/runner/work/_temp/AZ_CLI_GITHUB_ACTION_1725299684531.sh" Less than a second ago Up Less than a second MICROSOFT_AZURE_CLI_1725299684207_CONTAINER
Here's the output from docker inspect
:
[
{
"Id": "0810f471af786a0515825eed61eaf6faf14305d29c44eb6244421baf34acb38c",
"Created": "2024-09-02T17:54:44.547020102Z",
"Path": "bash",
"Args": [
"--noprofile",
"--norc",
"-e",
"/data/runner/work/_temp/AZ_CLI_GITHUB_ACTION_1725299684531.sh"
],
"State": {
"Status": "exited",
"Running": false,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 0,
"ExitCode": 1,
"Error": "",
"StartedAt": "2024-09-02T17:54:44.806744515Z",
"FinishedAt": "2024-09-02T17:54:44.809353308Z"
},
(...)
"HostConfig": {
"Binds": [
"/data/runner/work/<REDACTED>:/data/runner/work/<REDACTED>",
"/home/runner/.azure:/root/.azure",
"/data/runner/work/_temp:/data/runner/work/_temp"
],
(...)
"Mounts": [
{
"Type": "bind",
"Source": "/data/runner/work/<REDACTED>",
"Destination": "/data/runner/work/<REDACTED>",
"Mode": "",
"RW": true,
"Propagation": "rprivate"
},
{
"Type": "bind",
"Source": "/home/runner/.azure",
"Destination": "/root/.azure",
"Mode": "",
"RW": true,
"Propagation": "rprivate"
},
{
"Type": "bind",
"Source": "/data/runner/work/_temp",
"Destination": "/data/runner/work/_temp",
"Mode": "",
"RW": true,
"Propagation": "rprivate"
}
],
(...)
"Cmd": [
"bash",
"--noprofile",
"--norc",
"-e",
"/data/runner/work/_temp/AZ_CLI_GITHUB_ACTION_1725299684531.sh"
],
(...)
]
Hello @MoChilia, could you edit your branch and remove the cleanup steps from finally {}
? I'd like to try to run the container manually with the same arguments and analyses the script file as well. Thanks
@FlorentATo, I have removed the cleanup steps. Thank you for your efforts in helping to resolve this issue!
@MoChilia I found the issue. Basically, the tmp directory isn't the same between the runner's container and the cli container.
The runner's container mounts /mnt/data/yocto:/data/runner
The CLI container mounts /data/runner/work/_temp:/data/runner/work/_temp
I found other inconsistencies in our workflow and our server configuration but I believe the absence of RUNNER_TEMP
is the culprit here.
I've reverted my workflow to use azure/cli
and I'll get back to you in a few hours to confirm everything's ok.
Workflow
Error logs:
Runner: Self-hosted (on-prem, physical server running Ubuntu 22.04.3 LTS) GitHub Runner version: 2.319.1
Hypothesis: It seems like
createScriptFile
(more specificallyfs.writeFileSync
) silently fails to write to its container's filesystem.Proposed fix: Add validation step to validate 1) the file is present and 2) has the execution bit set.