Script execution: Run a script via CLI

noahtalerman commented 1 year ago

User story

As an IT admin or security engineer, I want to run a script once, on a specific host, so that I can remediate an issue or collect logs.

✅ Execute custom scripts (Fleet Premium (but this should really be Fleet Free imo, let's consider it -mike))

Requirements

Fleet Premium only
Only maintainers and up can run scripts. Team maintainer and team admins can only run scripts on hosts assigned to their team.
Supported on macOS, Windows, and Linux.
- On macOS and Linux, only scripts run with "#!/bin/sh" are supported. If the user doesn't specify, the script will run in the default shell.
- On Windows, only Powershell scripts are supported.
Script can be run via fleetctl CLI. Script only runs if host is online. Script only runs if it's content is 10,000 characters or less.
Script content, output, and exit code is viewable in the UI and CLI. Only the last 10,000 characters of script output are viewable.
Script execution is tracked in activity feed
Only possible to escalate Fleet/fleetd from a read-only to script execution system remotely for users who already have control over the device (via MDM).
For deployments that already have fleetd, they shouldn't be able to be autoupdated into this feature
If macOS MDM is turned on in Fleet, automatically turn on script execution features for macOS hosts via profile.
If MDM is turned off for a macOS host, automatically turn off script execution features.

Changes

[ ] Add a new fleetctl run-script command
[ ] Break the agent component for script execution out into a subsystem of Orbit (still in the Orbit binary) that only starts up if the appropriate conditions are met.
[ ] On macOS, the subsystem can be triggered by a configuration profile (similar to the strategy for #9459). When fleetd starts up, it should look for a profile of PayloadType com.fleetdm.fleeetd.config and extract the scripting key. If this value is true should the subsystem turn on.
[ ] On macOS, Windows, and Linux, the subsystem can be triggered by the --script-execution flag in the fleetctl package command. If this value is true the subsystem is turned on.

UI and CLI

https://www.figma.com/file/030moybHDEurtYx52D70sF/%239583-Script-execution%3A-Run-a-script-via-CLI?type=design&node-id=2-130&mode=design

noahtalerman commented 1 year ago

Zach: If we wait, what do we tell current fleetd users as we start to ship script execution features?

Zach: It should not be possible to turn on script execution through the Fleet system on its own. W/o deploying a new package or using some sort of write capability.

Roberto: What about having another "agent"? If you don't have that binary you don't have script execution enabled (I think maybe that's what you're suggesting?)

Zach: Adding another component sounds painful but is likely the most satisfying for the folks looking for this feature.

Zach: Another option is a flag that can't be configured remotely. If you want scripts, you have to deploy new packages, use some other tooling to turn scripts on.

Zach: Possible solution, config profile that configures fleetd to turn on script execution.

Roberto: Spitballing another option: having two "fleetd"s, one "hardened" that doesn't contain any of the remote execution code, which we can exclude using build constraints

Zach: This has the downside of something that has to be built, tested, and distributed separately. But it's not another process. For the user, they have to choose the "script" name in the update channel.

noahtalerman commented 1 year ago

@zwass I assigned this to you. Can you please update the issue with your proposal?

zwass commented 1 year ago

I think the key thing here is that it needs to be impossible to escalate Fleet/fleetd from a read-only system to an execute or write system remotely. For deployments that already have fleetd, they should not be able to be autoupdated or enrolled into this feature in any other way than by explicit action through a tool that already has write/execute on the system.

I propose we do this by breaking the agent component of this out into a subsystem of Orbit (still in the Orbit binary) that only starts up if the appropriate conditions are met. Otherwise, that subsystem should never start up. I think the subsystem should be triggered on by a configuration profile (similar to the strategy for https://github.com/fleetdm/fleet/issues/9459).

When fleetd starts up, it should look for a profile of PayloadType com.fleetdm.fleeetd.config and extract the scripting key. Only if this value is true should the subsystem turn on.

This ensures that only admin users can turn on scripting locally, and only a user who already has control over the device (via MDM) can turn scripting on remotely.

@roperzh does this make sense to you? Anything I could be missing here?

roperzh commented 1 year ago

Looks good to me! 👍

noahtalerman commented 1 year ago

the subsystem should be triggered on by a configuration profile

@zwass @roperzh what user action installs the profile on the hosts? Is it turning on MDM in Fleet (configuring Fleet with APNs and SCEP certs/keys)? Or something else...

zwass commented 1 year ago

Yeah I think Fleet should automatically send the profile upon enrollment.

Future considerations:

This feature can be disabled and then Fleet doesn't send the profile?
Scripting functionality doesn't require Fleet to be the MDM server and we let folks know that they need to deploy this profile in order to have the functionality (or we can build some other setting into the installer)?

noahtalerman commented 1 year ago

@zwass @roperzh I updated the "Requirements" and "Proposed solution" sections in this issue's description (using Zach's comment here: https://github.com/fleetdm/fleet/issues/9583#issuecomment-1413099039

When you get the chance, can you please take a look to see if I'm missing anything? Thanks :)

Future considerations:

This feature can be disabled and then Fleet doesn't send the profile?

Scripting functionality doesn't require Fleet to be the MDM server and we let folks know that they need to deploy this profile in order to have the functionality (or we can build some other setting into the installer)?

These make a lot of sense. My understand of (1) is if the user wants other MDM features but doesn't want scripts. (2) the user wants scripts but doesn't want to set up MDM features. I prefer to wait to prioritize these until we hear about them from users.

noahtalerman commented 1 year ago

@zwass @roperzh this feature must be included in the same (or before) fleetd release that includes script execution features, right?

Moreover, if we release this feature after we release script execution features, this requirement won't be met, right?

For deployments that already have fleetd, they should not be able to be autoupdated or enrolled into this feature in any other way than by explicit action through a tool that already has write/execute on the system.

noahtalerman commented 1 year ago

this feature must be included in the same (or before) fleetd release that includes script execution features, right?

Roberto: that's my understanding, yes

More discussion can be found here in Slack (internal): https://fleetdm.slack.com/archives/C03C41L5YEL/p1676393697336569

noahtalerman commented 1 year ago

@lukeheath heads up, my updated understanding is that this story must be included in the same release (or earlier) as the Manage scripts story here: #9537

lukeheath commented 1 year ago

Hey team! Please add your planning poker estimate with Zenhub @gillespi314 @mna @roperzh

noahtalerman commented 1 year ago

Hey @marko-lisica I'm passing you this issue.

noahtalerman commented 1 year ago

Hey @georgekarrv, here's the list of sub-tasks we drafted:

Build option
- Flag for turning on scripts on fleet
- Pushing down the .mobileconfig that turns on script execution
Fleet API
- Call to run the script
- Response to see the results
Agent subtask
- Script running component. Support on macOS, Windows, and Linux
CLI
- Script running
- See the output
UI subtasks
- See the content, output, and exit code

mna commented 1 year ago

@noahtalerman

Only maintainers and up can run scripts. Team maintainer and team admins can only run scripts on hosts assigned to their team.

Should we also allow the gitops role to run scripts? Also, I take it from the description that any role can read the script's output (except gitops of course)?

noahtalerman commented 1 year ago

@mna thanks for raising!

Should we also allow the gitops role to run scripts?

I think we don't want to allow GitOps role to run scripts.

I think the GitOps role is designed to only need permissions to apply Fleet YAML documents (run fleetctl apply).

@zhumo is that right?

I take it from the description that any role can read the script's output (except gitops of course)?

Yep!

zhumo commented 1 year ago

Hey Noah, Yes, let's exclude the gitops role for now. I'm not sure how you would use gitops to run scripts, like you'd need to git commit a new file which specifies the host and the script, but then after it's run, it stays in your repo forever? It's a bit of UX question that we should work on in a separate effort. In the meantime, then, let's not give the gitops user unnecessary powers.

Let's make sure to include that change in the user permissions doc (I've also included checking user permissions in the agenda for C&C)

noahtalerman commented 1 year ago

Hey @georgekarrv @mna heads up, I updated the requirements in this story to clarify what scripts are allowed for Mac:

On macOS and Linux, only ~~bash~~ scripts run with "#!/bin/sh" are supported. If the user doesn't specify, the script will run in #!/bin/sh.

cc @marko-lisica

noahtalerman commented 1 year ago

On macOS and Linux, only scripts run with "#!/bin/sh" are supported. If the user doesn't specify, the script will run in ~~#!/bin/sh~~ the default shell.

@mna I realized the specs were slightly confusing. I updated this to say scripts will run in "default shell."

That said, my understanding is that #!bin/sh will technically run the script in whatever the default shell is.

Is that right?

cc @spokanemac

mna commented 1 year ago

@noahtalerman

That said, my understanding is that #!bin/sh will technically run the script in whatever the default shell is.

That's my understanding too, for example on my laptop (Fedora Linux) it is bash, on macos it should be zsh (by default):

$ ll /bin/sh
lrwxrwxrwx. 1 root root 4 Feb  5  2023 /bin/sh -> bash

spokanemac commented 1 year ago

$ sudo cat /etc/passwd | grep root | cut -d ':' -f 7
/bin/sh

On macOS, sh is still implemented.

$ ls -la /bin/sh
-rwxr-xr-x  1 root  wheel  150384 Jul 11 01:56 /bin/sh

sh has limitations of built-in functions and commands. That should likely be noted.

Might also be worth a mention to check scripts with something like shellcheck.

noahtalerman commented 1 year ago

On macOS, sh is still implemented.

@spokanemac so, on the latest macOS, sh is the default shell? (unless the end user changed it)

Makes sense to clarify this in the docs.

Furthermore, what commands/functions does the user not get with sh?

mna commented 1 year ago

@spokanemac @noahtalerman

/bin/sh might not be a symlink on macos but it might still be a different shell (not Bourne Shell), can you run /bin/sh --help and see what it says? For example on my mac VM (Monterey 12.x version of macos), it is bash (the old 3.x version that ships with mac). On newer macos I'd expect it to be zsh, but I may be wrong (don't have one handy).

spokanemac commented 1 year ago

/bin/sh is the default shell for root.

/bin/zsh is the default shell for users.

There's a writeup here on the differences

$ /bin/sh --help
GNU bash, version 3.2.57(1)-release-(arm64-apple-darwin22)

mna commented 1 year ago

@spokanemac Sorry, I wasn't clear, I know the difference between the old shell sh and bash, what I mean is that the executable called /bin/sh might not actually be sh, it may very well be bash or zsh.

EDIT: I see you updated the comment above with the output of sh --help, thanks, that's what I meant that sh is probably not sh (although it's a quite old bash, it's still better than sh).

spokanemac commented 1 year ago

There is/was an instance on macOS where "bash-as-sh" was the default for recovery and installer scripts. Guess I am still a little shy from being bitten by that a few times.

Apple has been threatening for years now that bash is going away. At the end of that article is this:

zsh can be made to emulate sh by executing the command zsh --emulate sh.

noahtalerman commented 1 year ago

I am still a little shy from being bitten by that a few times.

@spokanemac I scheduled some time for us to chat about this. I want to learn more about this pain.

noahtalerman commented 1 year ago

Agenda for "Updates to scripts" call on 2023-08-30

Updates to UI (Figma):

Dev notes for when to show failed v. pending icons. exit_code will now be null until we've run the script. We added a host_timeout flag to the API to indicate that the script won't run.
- TODO Martin (Sarah: I can pick it up)
We can't do custom messages for permissions errors. Fleet doesn't support this. We updated these errors to make them more generic like You don't have permissions.
We added the UI for activity feed state when we hear from the host that script execution is disabled.

Updates to behavior:

Remove old scripts after 1 minute instead of 1 day. Why? We have UX around 1 minute timeout.
- UPDATE: Martin: I already changed this.
Async scripts only run on online hosts. Why? We have UX around running scripts on online hosts.
- UPDATE: Martin: I'll submit a PR for this.

sabrinabuckets commented 1 year ago

macOS testing:

Attempted to run a script on a macOS host with scripts not enabled, received: Error: Scripts are disabled for this host. To run scripts, deploy a Fleet installer with scripts enabled.
Built a new pkg with the --enable-scripts flag, installed on a macOS host but did not turn on MDM, script did not execute, received timeout warning: Error: Fleet hasn’t heard from the host in over 1 minute. Fleet doesn’t know if the script ran because the host went offline.
With MDM turned on for the host, using same pkg with scripts enabled, able to successfully run a script
~With MDM turned on, using a pkg built from TUF (latest main) without the --scripts-enabled flag, I receive the "scripts are disabled for this host" error. This is unexpected behavior, bug filed here~. This has been resolved, can run on host with MDM on but without the flag.
Verified that Activity feed displays script was run, able to view script details.
Scripts tab not yet present in host details page
Verified that Observer and Observer+ roles cannot run scripts, but Maintainer and higher (including Team Maintainer/Admin) are able to.
Ran a script designed to intentionally fail with an exit of 1, verified it failed as expected.
Attempted to run a Python script, received Error: File type not supported. Only .sh (Bash) and .ps1 (PowerShell) file types are allowed.
Attempted to run with #!/bin/bash and #!/bin/zsh and received Error: Interpreter not supported. Bash scripts must run in "#!/bin/sh”.
Attempted to run without Premium license, received Error: Requires Fleet Premium license
Ran script on offline host and received error Error: Fleet hasn’t heard from the host in over 1 minute. Fleet doesn’t know if the script ran because the host went offline. or Error: Script can’t run on offline host.
Verified unable to run script if MDM is turned off for a host without flag enabled
Verified that if flag is enabed but MDM is off, I am able to run a script

Private Zenhub Image

sabrinabuckets commented 1 year ago

Linux testing (tested on Fedora 38):

Ran script on host without scripts enabled, received expected error
Ran script intended to fail, script exited with a status of 1
Ran script greater than 10k lines. Error: Script is too large. It’s limited to 10,000 characters (approximately 125 lines).
Ran script with output > 10k lines, noted output stops at 10k.
Ran unsupported script types, script unable to run Error: File type not supported. Only .sh (Bash) and .ps1 (PowerShell) file types are allowed.
Confirmed scripts are logged in activity feed

noahtalerman commented 1 year ago

Docs for this feature are added in this PR: #13807

sabrinabuckets commented 1 year ago

Windows testing (Win11 Pro):

Tested against device with Scripts not enabled received expected error.
Installed MSI with scripts flag enabled, able to successfully run .ps1 scripts. .sh scripts with no on-device action run & provide output, however any on-device actions trigger a failure.
Ran scripts with intent to fail, verified they fail as expected and output contains exit code.
Ran supported script type with unsupported characters/formatting, observed correct errors in output.
Attempted to run a script from /bin/bash and received Error: Interpreter not supported. Bash scripts must run in "#!/bin/sh”..
Verified script does not run on offline device.
Verified Activity Feed is logging scripts and output.

noahtalerman commented 1 year ago

On macOS, Windows, and Linux, the subsystem can be triggered by the --script-execution flag in the fleetctl package command. If this value is true the subsystem is turned on.

@georgekarrv looks like we ended up building the flag for fleetd as --enable-scripts when the plan was --script-execution.

Looks like this happened because we had --enable-scripts in this subtask: https://github.com/fleetdm/fleet/issues/13304

Not a big issue but we should figure out how to prevent this from happening. Took a note to chat about this during our 1:1

noahtalerman commented 1 year ago

Confirm and celebrate: Docs are here: https://fleetdm.com/docs/using-fleet/scripts

fleet-release commented 1 year ago

Scripts run swiftly, In the cloud city, problems mend, Peace to admins lend.

noahtalerman commented 1 year ago

Updated the pricing page to reflect scripts as a premium feature: https://github.com/fleetdm/fleet/pull/14167

nonpunctual commented 9 months ago

@roperzh @noahtalerman I know this might not be the right place to comment but can you point me to the reason scripts were implemented with the 5m limitation? E.g., if I as an admin wanted to run the following script:

some loop with an indeterminate amount of iterations
each loop iteration has an intentional 1s pause
on execution it turns out the loop has 301 iterations = 301s = 5m 1s

my understanding is the script as executed by Fleet would fail. So, I am just trying to find out more about what we believe we are up against that caused us to add that arbitrary limit. Thanks!

noahtalerman commented 9 months ago

TLDR: We first built scripts w/ 1m timeout to prevent a confusing UX. We bumped to 5 mins because we hear about real world scenarios that needed a longer timeout (installomator). We can adjust further based on more real world scenarios.

the reason scripts were implemented with the 5m limitation?

@nonpunctual when we first built scripts, the timeout was 1m. And, the user could only run scripts on online hosts. Meaning, you ran the script and you had to sit and wait for a response.

Why the 1m timeout? We didn't want users to have to sit in front of the UI, terminal, etc. and wait for more than 1m for a response.

Then we got feedback that folks were trying to run scripts w/ installomator which often exceeded the 1m timeout. So we bumped to 5.

We can certainly continue to adjust this limit but we want real world scenarios in which 5 minutes doesn't work before we do.

some loop with an indeterminate amount of iterations each loop iteration has an intentional 1s pause on execution it turns out the loop has 301 iterations = 301s = 5m 1s

I think this example is too arbitrary. What's a script that IT admins often run that takes longer than 5 minutes? What does it do? (ex. install software) Why is it valuable.

Please feel free to bring this to feature fest!

nonpunctual commented 9 months ago

@roperzh Thanks @noahtalerman I will look at the Feature Fest docs to add this.

Many API endpoints have built-in rate limiting. Also, they can be slow when being hit with curl or wget from a shell.

I made my example generic but not arbitrary. I have experienced this not just with Jamf APIs, but internal Apple APis, Meraki cloud, ServiceNow, etc. So,

a loop in a script is sending HTTP requests (GET, PUT POST...) to a rate-limited API
the number of loop iterations is indeterminate
this is because various amounts of data are being read or uploaded to / from the API endpoint
the loop stops when the data operation is exhausted
there is no way to pre-determine the number of loop iterations (this would be common in workflows other than this one)
because of API rate limiting / poor performance each loop iteration has an intentional 1s pause
without the pausing the shell may lose data or cause execution to complete the loop & go to the next script command
this prevents the script from collecting the needed data or from uploading the intended data to the API
on execution it turns out the loop had 301 iterations = 301s = 5m 1s
i.e., a reasonable workflow in a script would fail if executed by Fleet

The larger issue is not this use case.

We are diminishing the value of the script execution feature if we are hobbling what can happen in the shell. Scripts should run EXACTLY AS IF they were being executed directly on the computer by a user, period. Saying we allow admins to execute scripts should mean we allow them to execute ANY script & we should not be in the way of that.

In my opinion, maybe the concept of "waiting" has been mistakenly conflated in this feature design with "waiting for Fleet to do something".

In this case, if workflows with long time frames are intentionally designed by an admin, Fleet is not doing anything bad. It is doing what they want it to do, i.e. running their script.

noahtalerman commented 9 months ago

Thanks @nonpunctual!

I think this deserves more discussion at product office hours.

I added a note to discuss during out next product office hours.

fleetdm / fleet