fleetdm / fleet

Open-source platform for IT, security, and infrastructure teams. (Linux, macOS, Chrome, Windows, cloud, data center)
https://fleetdm.com
Other
3.16k stars 432 forks source link

Script execution: Run a script via CLI #9583

Closed noahtalerman closed 1 year ago

noahtalerman commented 1 year ago

User story

As an IT admin or security engineer, I want to run a script once, on a specific host, so that I can remediate an issue or collect logs.

Requirements

Changes

UI and CLI

https://www.figma.com/file/030moybHDEurtYx52D70sF/%239583-Script-execution%3A-Run-a-script-via-CLI?type=design&node-id=2-130&mode=design

noahtalerman commented 1 year ago

Zach: If we wait, what do we tell current fleetd users as we start to ship script execution features?

Zach: It should not be possible to turn on script execution through the Fleet system on its own. W/o deploying a new package or using some sort of write capability.

Roberto: What about having another "agent"? If you don't have that binary you don't have script execution enabled (I think maybe that's what you're suggesting?)

Zach: Adding another component sounds painful but is likely the most satisfying for the folks looking for this feature.

Zach: Another option is a flag that can't be configured remotely. If you want scripts, you have to deploy new packages, use some other tooling to turn scripts on.

Zach: Possible solution, config profile that configures fleetd to turn on script execution.

Roberto: Spitballing another option: having two "fleetd"s, one "hardened" that doesn't contain any of the remote execution code, which we can exclude using build constraints

Zach: This has the downside of something that has to be built, tested, and distributed separately. But it's not another process. For the user, they have to choose the "script" name in the update channel.

noahtalerman commented 1 year ago

@zwass I assigned this to you. Can you please update the issue with your proposal?

zwass commented 1 year ago

I think the key thing here is that it needs to be impossible to escalate Fleet/fleetd from a read-only system to an execute or write system remotely. For deployments that already have fleetd, they should not be able to be autoupdated or enrolled into this feature in any other way than by explicit action through a tool that already has write/execute on the system.

I propose we do this by breaking the agent component of this out into a subsystem of Orbit (still in the Orbit binary) that only starts up if the appropriate conditions are met. Otherwise, that subsystem should never start up. I think the subsystem should be triggered on by a configuration profile (similar to the strategy for https://github.com/fleetdm/fleet/issues/9459).

When fleetd starts up, it should look for a profile of PayloadType com.fleetdm.fleeetd.config and extract the scripting key. Only if this value is true should the subsystem turn on.

This ensures that only admin users can turn on scripting locally, and only a user who already has control over the device (via MDM) can turn scripting on remotely.

@roperzh does this make sense to you? Anything I could be missing here?

roperzh commented 1 year ago

Looks good to me! 👍

noahtalerman commented 1 year ago

the subsystem should be triggered on by a configuration profile

@zwass @roperzh what user action installs the profile on the hosts? Is it turning on MDM in Fleet (configuring Fleet with APNs and SCEP certs/keys)? Or something else...

zwass commented 1 year ago

Yeah I think Fleet should automatically send the profile upon enrollment.

Future considerations:

noahtalerman commented 1 year ago

@zwass @roperzh I updated the "Requirements" and "Proposed solution" sections in this issue's description (using Zach's comment here: https://github.com/fleetdm/fleet/issues/9583#issuecomment-1413099039

When you get the chance, can you please take a look to see if I'm missing anything? Thanks :)

Future considerations:

  • This feature can be disabled and then Fleet doesn't send the profile?
  • Scripting functionality doesn't require Fleet to be the MDM server and we let folks know that they need to deploy this profile in order to have the functionality (or we can build some other setting into the installer)?

These make a lot of sense. My understand of (1) is if the user wants other MDM features but doesn't want scripts. (2) the user wants scripts but doesn't want to set up MDM features. I prefer to wait to prioritize these until we hear about them from users.

noahtalerman commented 1 year ago

@zwass @roperzh this feature must be included in the same (or before) fleetd release that includes script execution features, right?

Moreover, if we release this feature after we release script execution features, this requirement won't be met, right?

For deployments that already have fleetd, they should not be able to be autoupdated or enrolled into this feature in any other way than by explicit action through a tool that already has write/execute on the system.

noahtalerman commented 1 year ago

this feature must be included in the same (or before) fleetd release that includes script execution features, right?

Roberto: that's my understanding, yes

More discussion can be found here in Slack (internal): https://fleetdm.slack.com/archives/C03C41L5YEL/p1676393697336569

noahtalerman commented 1 year ago

@lukeheath heads up, my updated understanding is that this story must be included in the same release (or earlier) as the Manage scripts story here: #9537

lukeheath commented 1 year ago

Hey team! Please add your planning poker estimate with Zenhub @gillespi314 @mna @roperzh

noahtalerman commented 1 year ago

Hey @marko-lisica I'm passing you this issue.

noahtalerman commented 1 year ago

Hey @georgekarrv, here's the list of sub-tasks we drafted:

mna commented 1 year ago

@noahtalerman

Only maintainers and up can run scripts. Team maintainer and team admins can only run scripts on hosts assigned to their team.

Should we also allow the gitops role to run scripts? Also, I take it from the description that any role can read the script's output (except gitops of course)?

noahtalerman commented 1 year ago

@mna thanks for raising!

Should we also allow the gitops role to run scripts?

I think we don't want to allow GitOps role to run scripts.

I think the GitOps role is designed to only need permissions to apply Fleet YAML documents (run fleetctl apply).

@zhumo is that right?

I take it from the description that any role can read the script's output (except gitops of course)?

Yep!

zhumo commented 1 year ago

Hey Noah, Yes, let's exclude the gitops role for now. I'm not sure how you would use gitops to run scripts, like you'd need to git commit a new file which specifies the host and the script, but then after it's run, it stays in your repo forever? It's a bit of UX question that we should work on in a separate effort. In the meantime, then, let's not give the gitops user unnecessary powers.

Let's make sure to include that change in the user permissions doc (I've also included checking user permissions in the agenda for C&C)

noahtalerman commented 1 year ago

Hey @georgekarrv @mna heads up, I updated the requirements in this story to clarify what scripts are allowed for Mac:

cc @marko-lisica

noahtalerman commented 1 year ago

On macOS and Linux, only scripts run with "#!/bin/sh" are supported. If the user doesn't specify, the script will run in #!/bin/sh the default shell.

@mna I realized the specs were slightly confusing. I updated this to say scripts will run in "default shell."

That said, my understanding is that #!bin/sh will technically run the script in whatever the default shell is.

Is that right?

cc @spokanemac

mna commented 1 year ago

@noahtalerman

That said, my understanding is that #!bin/sh will technically run the script in whatever the default shell is.

That's my understanding too, for example on my laptop (Fedora Linux) it is bash, on macos it should be zsh (by default):

$ ll /bin/sh
lrwxrwxrwx. 1 root root 4 Feb  5  2023 /bin/sh -> bash
spokanemac commented 1 year ago
$ sudo cat /etc/passwd | grep root | cut -d ':' -f 7
/bin/sh

On macOS, sh is still implemented. 

$ ls -la /bin/sh
-rwxr-xr-x  1 root  wheel  150384 Jul 11 01:56 /bin/sh

sh has limitations of built-in functions and commands. That should likely be noted.

Might also be worth a mention to check scripts with something like shellcheck.

noahtalerman commented 1 year ago

On macOS, sh is still implemented.

@spokanemac so, on the latest macOS, sh is the default shell? (unless the end user changed it)

Makes sense to clarify this in the docs.

Furthermore, what commands/functions does the user not get with sh?

mna commented 1 year ago

@spokanemac @noahtalerman

/bin/sh might not be a symlink on macos but it might still be a different shell (not Bourne Shell), can you run /bin/sh --help and see what it says? For example on my mac VM (Monterey 12.x version of macos), it is bash (the old 3.x version that ships with mac). On newer macos I'd expect it to be zsh, but I may be wrong (don't have one handy).

spokanemac commented 1 year ago

/bin/sh is the default shell for root.

/bin/zsh is the default shell for users.

There's a writeup here on the differences

$ /bin/sh --help
GNU bash, version 3.2.57(1)-release-(arm64-apple-darwin22)
mna commented 1 year ago

@spokanemac Sorry, I wasn't clear, I know the difference between the old shell sh and bash, what I mean is that the executable called /bin/sh might not actually be sh, it may very well be bash or zsh.

EDIT: I see you updated the comment above with the output of sh --help, thanks, that's what I meant that sh is probably not sh (although it's a quite old bash, it's still better than sh).

spokanemac commented 1 year ago

There is/was an instance on macOS where "bash-as-sh" was the default for recovery and installer scripts. Guess I am still a little shy from being bitten by that a few times.

Apple has been threatening for years now that bash is going away. At the end of that article is this:

zsh can be made to emulate sh by executing the command zsh --emulate sh.

noahtalerman commented 1 year ago

I am still a little shy from being bitten by that a few times.

@spokanemac I scheduled some time for us to chat about this. I want to learn more about this pain.

noahtalerman commented 1 year ago

Agenda for "Updates to scripts" call on 2023-08-30

Updates to UI (Figma):

Updates to behavior:

sabrinabuckets commented 1 year ago

macOS testing:

Private Zenhub Image

Private Zenhub Image

sabrinabuckets commented 1 year ago

Linux testing (tested on Fedora 38):

noahtalerman commented 1 year ago

Docs for this feature are added in this PR: #13807

sabrinabuckets commented 1 year ago

Windows testing (Win11 Pro):

noahtalerman commented 1 year ago

On macOS, Windows, and Linux, the subsystem can be triggered by the --script-execution flag in the fleetctl package command. If this value is true the subsystem is turned on.

@georgekarrv looks like we ended up building the flag for fleetd as --enable-scripts when the plan was --script-execution.

Looks like this happened because we had --enable-scripts in this subtask: https://github.com/fleetdm/fleet/issues/13304

Not a big issue but we should figure out how to prevent this from happening. Took a note to chat about this during our 1:1

noahtalerman commented 1 year ago

Confirm and celebrate: Docs are here: https://fleetdm.com/docs/using-fleet/scripts

fleet-release commented 1 year ago

Scripts run swiftly, In the cloud city, problems mend, Peace to admins lend.

noahtalerman commented 1 year ago

Updated the pricing page to reflect scripts as a premium feature: https://github.com/fleetdm/fleet/pull/14167

nonpunctual commented 9 months ago

@roperzh @noahtalerman I know this might not be the right place to comment but can you point me to the reason scripts were implemented with the 5m limitation? E.g., if I as an admin wanted to run the following script:

my understanding is the script as executed by Fleet would fail. So, I am just trying to find out more about what we believe we are up against that caused us to add that arbitrary limit. Thanks!

noahtalerman commented 9 months ago

TLDR: We first built scripts w/ 1m timeout to prevent a confusing UX. We bumped to 5 mins because we hear about real world scenarios that needed a longer timeout (installomator). We can adjust further based on more real world scenarios.

the reason scripts were implemented with the 5m limitation?

@nonpunctual when we first built scripts, the timeout was 1m. And, the user could only run scripts on online hosts. Meaning, you ran the script and you had to sit and wait for a response.

Why the 1m timeout? We didn't want users to have to sit in front of the UI, terminal, etc. and wait for more than 1m for a response.

Then we got feedback that folks were trying to run scripts w/ installomator which often exceeded the 1m timeout. So we bumped to 5.

We can certainly continue to adjust this limit but we want real world scenarios in which 5 minutes doesn't work before we do.

some loop with an indeterminate amount of iterations each loop iteration has an intentional 1s pause on execution it turns out the loop has 301 iterations = 301s = 5m 1s

I think this example is too arbitrary. What's a script that IT admins often run that takes longer than 5 minutes? What does it do? (ex. install software) Why is it valuable.

Please feel free to bring this to feature fest!

nonpunctual commented 9 months ago

@roperzh Thanks @noahtalerman I will look at the Feature Fest docs to add this.

Many API endpoints have built-in rate limiting. Also, they can be slow when being hit with curl or wget from a shell.

I made my example generic but not arbitrary. I have experienced this not just with Jamf APIs, but internal Apple APis, Meraki cloud, ServiceNow, etc. So,

The larger issue is not this use case.

We are diminishing the value of the script execution feature if we are hobbling what can happen in the shell. Scripts should run EXACTLY AS IF they were being executed directly on the computer by a user, period. Saying we allow admins to execute scripts should mean we allow them to execute ANY script & we should not be in the way of that.

In my opinion, maybe the concept of "waiting" has been mistakenly conflated in this feature design with "waiting for Fleet to do something".

In this case, if workflows with long time frames are intentionally designed by an admin, Fleet is not doing anything bad. It is doing what they want it to do, i.e. running their script.

noahtalerman commented 9 months ago

Thanks @nonpunctual!

I think this deserves more discussion at product office hours.

I added a note to discuss during out next product office hours.