Azure / cli

Automate your GitHub workflows using Azure CLI scripts
MIT License
129 stars 54 forks source link

bash: /data/runner/work/_temp/AZ_CLI_GITHUB_ACTION_<timestamp>.sh: No such file or directory #161

Open FlorentATo opened 2 months ago

FlorentATo commented 2 months ago

Workflow

(...)
    - name: Azure CLI script to copy artifact
      uses: azure/cli@v2
      with:
        azcliversion: latest
        inlineScript: |
          az storage blob upload --account-name REDACTED  --container-name REDACTED  --name target_file_name --file ${{inputs.file_path}}/${{inputs.file}} --auth-mode login

Error logs:

Run azure/cli@v2
Starting script execution via docker image mcr.microsoft.com/azure-cli:latest
Error: Error: Unable to find image 'mcr.microsoft.com/azure-cli:latest' locally
latest: Pulling from azure-cli
c6a83fedfae6: Already exists
b9dc4119f2ec: Pulling fs layer
545d94f91829: Pulling fs layer
4271f5ef1d39: Pulling fs layer
780f71a86072: Pulling fs layer
860a1e8f6d7f: Pulling fs layer
4f4fb700ef54: Pulling fs layer
780f71a86072: Waiting
4f4fb700ef54: Waiting
860a1e8f6d7f: Waiting
4271f5ef1d39: Verifying Checksum
4271f5ef1d39: Download complete
b9dc4119f2ec: Download complete
b9dc4119f2ec: Pull complete
545d94f91829: Verifying Checksum
545d94f91829: Download complete
780f71a86072: Verifying Checksum
780f71a86072: Download complete
4f4fb700ef54: Verifying Checksum
4f4fb700ef54: Download complete
545d94f91829: Pull complete
4271f5ef1d39: Pull complete
780f71a86072: Pull complete
860a1e8f6d7f: Verifying Checksum
860a1e8f6d7f: Download complete
860a1e8f6d7f: Pull complete
4f4fb700ef54: Pull complete
Digest: sha256:38f8385bb27f9b6dabee3045852cabce2e9da7d4ed35d5783b4691a02489b4c2
Status: Downloaded newer image for mcr.microsoft.com/azure-cli:latest
bash: /data/runner/work/_temp/AZ_CLI_GITHUB_ACTION_1724872531969.sh: No such file or directory

cleaning up container...
MICROSOFT_AZURE_CLI_1724872531842_CONTAINER

Runner: Self-hosted (on-prem, physical server running Ubuntu 22.04.3 LTS) GitHub Runner version: 2.319.1

Hypothesis: It seems like createScriptFile (more specifically fs.writeFileSync) silently fails to write to its container's filesystem.

Proposed fix: Add validation step to validate 1) the file is present and 2) has the execution bit set.

MoChilia commented 2 months ago

Hi @FlorentATo, I've created a new branch azure/cli@fix-161 with additional debugging logs to help identify whether your issue is caused by a mislead filepath, a failed file creation, or limited permissions. Could you please execute the following step and share the logs with me?

- name: Azure CLI script to copy artifact
      uses: azure/cli@fix-161
      with:
        azcliversion: latest
        inlineScript: |
          az storage blob upload --account-name REDACTED  --container-name REDACTED  --name target_file_name --file ${{inputs.file_path}}/${{inputs.file}} --auth-mode login
FlorentATo commented 2 months ago

Sure thing, will do in the morning and keep you posted. Thank you

FlorentATo commented 2 months ago

Logs:

Run azure/login@v1.6.1
Running Azure CLI Login.
/usr/bin/az cloud set -n azurecloud
Done setting cloud: "azurecloud"
Note: Azure/login action also supports OIDC login mechanism. Refer https://github.com/azure/login#configure-a-service-principal-with-a-federated-credential-to-use-oidc-based-authentication for more details.
Attempting Azure CLI login by using service principal with secret...
Subscription is set successfully.
Azure CLI login succeeds by using service principal with secret.
Run azure/cli@fix-161
Script file created at /data/runner/work/_temp/AZ_CLI_GITHUB_ACTION_1725038641263.sh
The file: '/data/runner/work/_temp/AZ_CLI_GITHUB_ACTION_1725038641263.sh' exists.
Access to the script file is available.
chmod +x /data/runner/work/_temp/AZ_CLI_GITHUB_ACTION_1725038641263.sh
Executable permissions given to the script file.
Starting script execution via docker image mcr.microsoft.com/azure-cli:latest
Error: Error: bash: /data/runner/work/_temp/AZ_CLI_GITHUB_ACTION_1725038641263.sh: No such file or directory

I'm looking into the interaction between the host and the job's container.

MoChilia commented 2 months ago

@FlorentATo, it appears that the volume mounting between the host and container failed on your runner. Could you please try using azure/cli@fix-161 again? I've moved the script file creation to occur after the mounting. But I'm also concerned that the login information is not mounted correctly either.

FlorentATo commented 2 months ago

Logs:

(...)
Download action repository 'azure/login@v1.6.1' (SHA:cb79c773a3cfa27f31f25eb3f677781210c9ce3d)
(...)
Run azure/login@v1.6.1
Running Azure CLI Login.
/usr/bin/az cloud set -n azurecloud
Done setting cloud: "azurecloud"
Note: Azure/login action also supports OIDC login mechanism. Refer https://github.com/azure/login#configure-a-service-principal-with-a-federated-credential-to-use-oidc-based-authentication for more details.
Attempting Azure CLI login by using service principal with secret...
Subscription is set successfully.
Azure CLI login succeeds by using service principal with secret.
Run azure/cli@fix-161

Script file created at /data/runner/work/_temp/AZ_CLI_GITHUB_ACTION_1725295367243.sh
The file: '/data/runner/work/_temp/AZ_CLI_GITHUB_ACTION_1725295367243.sh' exists.
Access to the script file is available.
chmod +x /data/runner/work/_temp/AZ_CLI_GITHUB_ACTION_1725295367243.sh
Executable permissions given to the script file.
Starting script execution via docker image mcr.microsoft.com/azure-cli:latest
Error: Error: bash: /data/runner/work/_temp/AZ_CLI_GITHUB_ACTION_1725295367243.sh: No such file or directory

I don't believe there's an issue between the runner's container and the host, as I can see the ${RUNNER_WORKDIR} properly mounted into the container.

Here's the container's spec:

flodev@flo-svr-bld001:~$ docker inspect 42a53ace353d | jq '.[].Mounts[]'
{
  "Type": "bind",
  "Source": "/mnt/data/yocto",
  "Destination": "/data/runner",
  "Mode": "",
  "RW": true,
  "Propagation": "rprivate"
}
{
  "Type": "bind",
  "Source": "/var/run/docker.sock",
  "Destination": "/var/run/docker.sock",
  "Mode": "",
  "RW": true,
  "Propagation": "rprivate"
}
flodev@flo-svr-bld001:~$ docker inspect 42a53ace353d | jq '.[].HostConfig.Binds[]'
"/mnt/data/yocto:/data/runner"
"/var/run/docker.sock:/var/run/docker.sock"

and the resulting mount points in the runner's container:

runner@42a53ace353d:/actions-runner$ mount | grep data
/dev/mapper/ubuntu--vg-data on /data/runner type ext4 (rw,relatime)
runner@42a53ace353d:/actions-runner$ df | grep data
/dev/mapper/ubuntu--vg-data       1737141248 122702132 1526123420   8% /data/runner

From the host, I can see the script file being created properly:

flodev@flo-svr-bld001:~$ while sleep 0.5; do date; ls -lR /mnt/data/yocto/work/_temp; done

returns:

Mon Sep  2 04:42:47 PM UTC 2024
/mnt/data/yocto/work/_temp:
total 93428
drwxr-xr-x 6 flodev 121     4096 Jul 16 19:04 1c380dc8-eb75-4b3a-8f8f-2905ab1d68a1
-rw-r--r-- 1 flodev 121      115 Sep  2 16:42 ab1e7f15-d265-4a11-9812-408120b92210.sh
-rwxr-xr-x 1 flodev 121      105 Sep  2 16:42 AZ_CLI_GITHUB_ACTION_1725295367243.sh     <= HERE
-rw-r--r-- 1 flodev 121       22 Sep  2 16:42 de489297-ce9e-4009-aaf8-64dc0a47d6df.sh
-rw-r--r-- 1 flodev 121      559 Sep  2 16:37 e9164171-3d9c-418b-b6ba-2edda957f30e.sh
-rw-r--r-- 1 flodev 121 95640714 Sep  2 16:42 f2ff789f-5046-4b90-a532-7c45df30373e
drwxr-xr-x 2 flodev 121     4096 Sep  2 16:37 _github_workflow
drwxr-xr-x 2 flodev 121     4096 Sep  2 16:42 _runner_file_commands
(...)

However the file is deleted immediately after creation (< 1second later), which makes believe the docker container resulting from executeDockerCommand can't access the script file, fails, and gets removed then replaced by a fresh runner container.

Help me clarify something: does the azure/cli container run within the runner's container ? Or beside it ? Or replace it during its execution ? I believe it runs within but I'd like to have confirmation.

Clarification: the script calls the docker daemon running on the host via /var/run/docker.sock which is exposed (mounted) into the runner's container (see above). So the ephemeral container that runs the inline bash script runs beside the runner's container.

I'll try to run it manually and see what happens...

FlorentATo commented 2 months ago

So I was able to capture when the container that runs the script is created (docker ps --no-trunc):

0810f471af786a0515825eed61eaf6faf14305d29c44eb6244421baf34acb38c   mcr.microsoft.com/azure-cli:latest                           "bash --noprofile --norc -e /data/runner/work/_temp/AZ_CLI_GITHUB_ACTION_1725299684531.sh"   Less than a second ago   Up Less than a second             MICROSOFT_AZURE_CLI_1725299684207_CONTAINER

Here's the output from docker inspect:

[
    {
        "Id": "0810f471af786a0515825eed61eaf6faf14305d29c44eb6244421baf34acb38c",
        "Created": "2024-09-02T17:54:44.547020102Z",
        "Path": "bash",
        "Args": [
            "--noprofile",
            "--norc",
            "-e",
            "/data/runner/work/_temp/AZ_CLI_GITHUB_ACTION_1725299684531.sh"
        ],
        "State": {
            "Status": "exited",
            "Running": false,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 0,
            "ExitCode": 1,
            "Error": "",
            "StartedAt": "2024-09-02T17:54:44.806744515Z",
            "FinishedAt": "2024-09-02T17:54:44.809353308Z"
        },
(...)
        "HostConfig": {
            "Binds": [
                "/data/runner/work/<REDACTED>:/data/runner/work/<REDACTED>",
                "/home/runner/.azure:/root/.azure",
                "/data/runner/work/_temp:/data/runner/work/_temp"
            ],
(...)
        "Mounts": [
            {
                "Type": "bind",
                "Source": "/data/runner/work/<REDACTED>",
                "Destination": "/data/runner/work/<REDACTED>",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/home/runner/.azure",
                "Destination": "/root/.azure",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/data/runner/work/_temp",
                "Destination": "/data/runner/work/_temp",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            }
        ],
(...)
            "Cmd": [
                "bash",
                "--noprofile",
                "--norc",
                "-e",
                "/data/runner/work/_temp/AZ_CLI_GITHUB_ACTION_1725299684531.sh"
            ],
(...)
]
FlorentATo commented 2 months ago

Hello @MoChilia, could you edit your branch and remove the cleanup steps from finally {} ? I'd like to try to run the container manually with the same arguments and analyses the script file as well. Thanks

MoChilia commented 2 months ago

@FlorentATo, I have removed the cleanup steps. Thank you for your efforts in helping to resolve this issue!

FlorentATo commented 2 months ago

@MoChilia I found the issue. Basically, the tmp directory isn't the same between the runner's container and the cli container.

The runner's container mounts /mnt/data/yocto:/data/runner The CLI container mounts /data/runner/work/_temp:/data/runner/work/_temp

I found other inconsistencies in our workflow and our server configuration but I believe the absence of RUNNER_TEMP is the culprit here.

I've reverted my workflow to use azure/cli and I'll get back to you in a few hours to confirm everything's ok.