Azure / bicep

Bicep is a declarative language for describing and deploying Azure resources
MIT License
3.21k stars 745 forks source link

virtualMachines/runCommands seems to run the script also locally when called from DevOps pipeline agent running Ubuntu #14663

Closed jtklahti closed 5 days ago

jtklahti commented 1 month ago

Bicep version Bicep CLI version 0.29.45 (57a44c0230)

Describe the bug Running a command on a VM via Bicep using Microsoft.Compute/virtualMachines/runCommands seems to run the script also on the calling DevOps pipeline agent. Script is run correctly on the VM.

Expected: the script should only be run on the VM.

To Reproduce Steps to reproduce the behavior: probably easily reproducable on DevOps pipeline agent with Ubuntu 22.04.

I tried to reproduce on a Linux VM with no luck. My VM didn't have /var/lib/waagent, which seems to be used here. Also that script is run as root.

Additional context I am using Azure DevOps pipeline to run the script. Agent is using ubuntu-latest (Ubuntu 22.04) image. Pipeline uses task AzureCLI@2. The issue has been occurring for many months now, but I just figured out the reason few days ago and managed to workaround it. Taskfile.dev is used to run az cli commands.

Some version info from pipeline run:

Task         : Azure CLI
Version      : 2.242.0

/usr/bin/az --version
azure-cli                         2.62.0

core                              2.62.0
telemetry                          1.1.0

Extensions:
azure-devops                       1.0.1

Dependencies:
msal                              1.28.1
azure-mgmt-resource               23.1.1

Python location '/opt/az/bin/python3'
Extensions directory '/opt/az/azcliextensions'

Python (Linux) 3.11.8 (main, Jul  4 2024, 02:23:40) [GCC 11.4.0]

Bicep code to run install script on Azure VM:

resource virtualMachine 'Microsoft.Compute/virtualMachines@2023-03-01' existing = {
  name: vmName
}

resource runCmd 'Microsoft.Compute/virtualMachines/runCommands@2024-03-01' = {
  name: 'install-script'
  location: location
  parent: virtualMachine
  properties: {
    asyncExecution: false
    parameters: [
      {
        name: 'OTHER_PARAM'
        value: string(clients)
      }
      {
        name: 'VMNAME'
        value: vmName
      }
    ]
    source: {
      script: loadTextContent('./install.sh')
    }
  }
}

Beginning of the install.sh script:

if [ "$VMNAME" != "$(hostname)" ]; then
  echo "VMNAME '$VMNAME' does not match hostname '$(hostname)'."
  # debug this script running in agent, where it should not run
  out_file="/home/vsts/scriptRuns.log"
  date --iso-8601=seconds >> $out_file || true
  parent_pid=$(ps -o ppid= -p $PPID)
  ps -f -p $parent_pid >> $out_file || true
  ps -f -p $PPID --no-headers >> $out_file || true
  ps -f -p $$ --no-headers >> $out_file || true
  pstree -sAg $PPID >> $out_file || true
  exit 1
fi

# install actions ...

I added the if-clause to workaround this strange behavior. If everything worked as it should, the if-clause condition should never be true. Script is run on the VM as it should, ie. then the if condition is not true.

The script is also run on the pipeline agent, where the if-condition is true and if-clause is executed. I added some quick debug logging to get some clue about the context in which the script is run.

In the final steps of the pipeline, I printed the contents of /home/vsts/scriptRuns.log:

2024-07-25T12:50:24+00:00
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 12:43 ?        00:00:03 /sbin/init
root        3377       1  0 12:50 ?        00:00:00 /var/lib/waagent/Microsoft.CPlat.Core.RunCommandHandlerLinux-1.3.7/bin/run-command-handler enable
root        3384    3377  0 12:50 ?        00:00:00 /bin/bash /var/lib/waagent/run-command-handler/download/install-script/21/script.sh
systemd(1)---run-command-han(3365)-+-script.sh(3365)---pstree(3365)
                                   |-{run-command-han}(3365)
                                   |-{run-command-han}(3365)
                                   |-{run-command-han}(3365)
                                   `-{run-command-han}(3365)

Looking at the timestamp, it seems that the script is run on the agent about 5s after az deployment group create command calling the mentioned bicep code returns. Looking at the output, it seems that the script is indeed run by some run-command handler process. Log line telling create-commands duration and timestamp:

2024-07-25T12:50:19.2623848Z INFO: Command ran in 280.197 seconds (init: 0.186, invoke: 280.010)

Run-command's output is also fetched with:

az vm run-command show ...
            --run-command-name "install-script" \
            --instance-view | jq -r '.instanceView.output' | tr -d '\n'

... which also returned around 2024-07-25T12:50:20.6194368Z.

alex-frankel commented 1 month ago

I can't think of any possible way that what your describing could happen :) A virtual machine would have no way to run a script in a pipeline.

Is it possible this script was added as a pipeline step/task and then it was forgotten about?

jtklahti commented 1 month ago

If the script would have been run by some pipeline step/task, those ps outputs would look very different, wouldn't they?

It seems to me that the script is somehow executed by az cli ("az deployment group create") or something started by it. That virtual machine very probably doesn't have anything to do with this.

That log file I mentioned and is read in the pipeline points to this direction:

root        3377       1  0 12:50 ?        00:00:00 /var/lib/waagent/Microsoft.CPlat.Core.RunCommandHandlerLinux-1.3.7/bin/run-command-handler enable
root        3384    3377  0 12:50 ?        00:00:00 /bin/bash /var/lib/waagent/run-command-handler/download/install-script/21/script.sh

Those lines, to my understanding, are outputs of these commands (in the script):

ps -f -p $PPID --no-headers >> $out_file || true
ps -f -p $$ --no-headers >> $out_file || true

That "/var/lib/waagent/Microsoft.CPlat.Core.RunCommandHandlerLinux-1.3.7/bin/run-command-handler" sounds very much related to runCommand.

jtklahti commented 1 month ago

I'll see if I have time to make a simple example repository and pipeline, that would reproduce this issue. And yes, I have checked, that the script in this issue is not executed in any other way in the pipeline.

As the script seems to run a few seconds after az deployment group create executing runCommand returns, at least some sleep is probably needed if some one tries to repro this issue.

microsoft-github-policy-service[bot] commented 1 week ago

Hi @jtklahti, this issue has been marked as stale because it was labeled as requiring author feedback but has not had any activity for 7 days. It will be closed if no further activity occurs within 3 days of this comment. Thanks for contributing to bicep! :smile: :mechanical_arm: