Closed noahtalerman closed 1 year ago
Zach: If we wait, what do we tell current fleetd users as we start to ship script execution features?
Zach: It should not be possible to turn on script execution through the Fleet system on its own. W/o deploying a new package or using some sort of write capability.
Roberto: What about having another "agent"? If you don't have that binary you don't have script execution enabled (I think maybe that's what you're suggesting?)
Zach: Adding another component sounds painful but is likely the most satisfying for the folks looking for this feature.
Zach: Another option is a flag that can't be configured remotely. If you want scripts, you have to deploy new packages, use some other tooling to turn scripts on.
Zach: Possible solution, config profile that configures fleetd to turn on script execution.
Roberto: Spitballing another option: having two "fleetd"s, one "hardened" that doesn't contain any of the remote execution code, which we can exclude using build constraints
Zach: This has the downside of something that has to be built, tested, and distributed separately. But it's not another process. For the user, they have to choose the "script" name in the update channel.
@zwass I assigned this to you. Can you please update the issue with your proposal?
I think the key thing here is that it needs to be impossible to escalate Fleet/fleetd from a read-only system to an execute or write system remotely. For deployments that already have fleetd, they should not be able to be autoupdated or enrolled into this feature in any other way than by explicit action through a tool that already has write/execute on the system.
I propose we do this by breaking the agent component of this out into a subsystem of Orbit (still in the Orbit binary) that only starts up if the appropriate conditions are met. Otherwise, that subsystem should never start up. I think the subsystem should be triggered on by a configuration profile (similar to the strategy for https://github.com/fleetdm/fleet/issues/9459).
When fleetd starts up, it should look for a profile of PayloadType
com.fleetdm.fleeetd.config
and extract the scripting
key. Only if this value is true
should the subsystem turn on.
This ensures that only admin users can turn on scripting locally, and only a user who already has control over the device (via MDM) can turn scripting on remotely.
@roperzh does this make sense to you? Anything I could be missing here?
Looks good to me! 👍
the subsystem should be triggered on by a configuration profile
@zwass @roperzh what user action installs the profile on the hosts? Is it turning on MDM in Fleet (configuring Fleet with APNs and SCEP certs/keys)? Or something else...
Yeah I think Fleet should automatically send the profile upon enrollment.
Future considerations:
@zwass @roperzh I updated the "Requirements" and "Proposed solution" sections in this issue's description (using Zach's comment here: https://github.com/fleetdm/fleet/issues/9583#issuecomment-1413099039
When you get the chance, can you please take a look to see if I'm missing anything? Thanks :)
Future considerations:
- This feature can be disabled and then Fleet doesn't send the profile?
- Scripting functionality doesn't require Fleet to be the MDM server and we let folks know that they need to deploy this profile in order to have the functionality (or we can build some other setting into the installer)?
These make a lot of sense. My understand of (1) is if the user wants other MDM features but doesn't want scripts. (2) the user wants scripts but doesn't want to set up MDM features. I prefer to wait to prioritize these until we hear about them from users.
@zwass @roperzh this feature must be included in the same (or before) fleetd release that includes script execution features, right?
Moreover, if we release this feature after we release script execution features, this requirement won't be met, right?
For deployments that already have fleetd, they should not be able to be autoupdated or enrolled into this feature in any other way than by explicit action through a tool that already has write/execute on the system.
this feature must be included in the same (or before) fleetd release that includes script execution features, right?
Roberto: that's my understanding, yes
More discussion can be found here in Slack (internal): https://fleetdm.slack.com/archives/C03C41L5YEL/p1676393697336569
@lukeheath heads up, my updated understanding is that this story must be included in the same release (or earlier) as the Manage scripts story here: #9537
Hey team! Please add your planning poker estimate with Zenhub @gillespi314 @mna @roperzh
Hey @marko-lisica I'm passing you this issue.
Hey @georgekarrv, here's the list of sub-tasks we drafted:
@noahtalerman
Only maintainers and up can run scripts. Team maintainer and team admins can only run scripts on hosts assigned to their team.
Should we also allow the gitops
role to run scripts? Also, I take it from the description that any role can read the script's output (except gitops
of course)?
@mna thanks for raising!
Should we also allow the gitops role to run scripts?
I think we don't want to allow GitOps role to run scripts.
I think the GitOps role is designed to only need permissions to apply Fleet YAML documents (run fleetctl apply
).
@zhumo is that right?
I take it from the description that any role can read the script's output (except gitops of course)?
Yep!
Hey Noah, Yes, let's exclude the gitops role for now. I'm not sure how you would use gitops to run scripts, like you'd need to git commit a new file which specifies the host and the script, but then after it's run, it stays in your repo forever? It's a bit of UX question that we should work on in a separate effort. In the meantime, then, let's not give the gitops user unnecessary powers.
Let's make sure to include that change in the user permissions doc (I've also included checking user permissions in the agenda for C&C)
Hey @georgekarrv @mna heads up, I updated the requirements in this story to clarify what scripts are allowed for Mac:
cc @marko-lisica
On macOS and Linux, only scripts run with "#!/bin/sh" are supported. If the user doesn't specify, the script will run in
#!/bin/shthe default shell.
@mna I realized the specs were slightly confusing. I updated this to say scripts will run in "default shell."
That said, my understanding is that #!bin/sh
will technically run the script in whatever the default shell is.
Is that right?
cc @spokanemac
@noahtalerman
That said, my understanding is that #!bin/sh will technically run the script in whatever the default shell is.
That's my understanding too, for example on my laptop (Fedora Linux) it is bash, on macos it should be zsh (by default):
$ ll /bin/sh
lrwxrwxrwx. 1 root root 4 Feb 5 2023 /bin/sh -> bash
$ sudo cat /etc/passwd | grep root | cut -d ':' -f 7
/bin/sh
On macOS, sh
is still implemented.
$ ls -la /bin/sh
-rwxr-xr-x 1 root wheel 150384 Jul 11 01:56 /bin/sh
sh
has limitations of built-in functions and commands. That should likely be noted.
Might also be worth a mention to check scripts with something like shellcheck.
On macOS, sh is still implemented.
@spokanemac so, on the latest macOS, sh
is the default shell? (unless the end user changed it)
Makes sense to clarify this in the docs.
Furthermore, what commands/functions does the user not get with sh
?
@spokanemac @noahtalerman
/bin/sh
might not be a symlink on macos but it might still be a different shell (not Bourne Shell), can you run /bin/sh --help
and see what it says? For example on my mac VM (Monterey 12.x version of macos), it is bash
(the old 3.x version that ships with mac). On newer macos I'd expect it to be zsh
, but I may be wrong (don't have one handy).
/bin/sh
is the default shell for root
.
/bin/zsh
is the default shell for users.
There's a writeup here on the differences
$ /bin/sh --help
GNU bash, version 3.2.57(1)-release-(arm64-apple-darwin22)
@spokanemac Sorry, I wasn't clear, I know the difference between the old shell sh
and bash
, what I mean is that the executable called /bin/sh
might not actually be sh
, it may very well be bash or zsh.
EDIT: I see you updated the comment above with the output of sh --help
, thanks, that's what I meant that sh
is probably not sh
(although it's a quite old bash
, it's still better than sh
).
There is/was an instance on macOS where "bash-as-sh" was the default for recovery and installer scripts. Guess I am still a little shy from being bitten by that a few times.
Apple has been threatening for years now that bash is going away. At the end of that article is this:
zsh can be made to emulate sh by executing the command
zsh --emulate sh
.
I am still a little shy from being bitten by that a few times.
@spokanemac I scheduled some time for us to chat about this. I want to learn more about this pain.
Updates to UI (Figma):
exit_code
will now be null
until we've run the script. We added a host_timeout
flag to the API to indicate that the script won't run.
You don't have permissions
. Updates to behavior:
macOS testing:
Error: Scripts are disabled for this host. To run scripts, deploy a Fleet installer with scripts enabled.
--enable-scripts
flag, installed on a macOS host but did not turn on MDM, script did not execute, received timeout warning: Error: Fleet hasn’t heard from the host in over 1 minute. Fleet doesn’t know if the script ran because the host went offline.
--scripts-enabled
flag, I receive the "scripts are disabled for this host" error. This is unexpected behavior, bug filed here~. This has been resolved, can run on host with MDM on but without the flag.Error: File type not supported. Only .sh (Bash) and .ps1 (PowerShell) file types are allowed.
#!/bin/bash
and #!/bin/zsh
and received Error: Interpreter not supported. Bash scripts must run in "#!/bin/sh”.
Error: Requires Fleet Premium license
Error: Fleet hasn’t heard from the host in over 1 minute. Fleet doesn’t know if the script ran because the host went offline.
or Error: Script can’t run on offline host.
Linux testing (tested on Fedora 38):
Error: Script is too large. It’s limited to 10,000 characters (approximately 125 lines).
Error: File type not supported. Only .sh (Bash) and .ps1 (PowerShell) file types are allowed.
Docs for this feature are added in this PR: #13807
Windows testing (Win11 Pro):
.ps1
scripts. .sh
scripts with no on-device action run & provide output, however any on-device actions trigger a failure.Error: Interpreter not supported. Bash scripts must run in "#!/bin/sh”.
.On macOS, Windows, and Linux, the subsystem can be triggered by the
--script-execution
flag in the fleetctl package command. If this value istrue
the subsystem is turned on.
@georgekarrv looks like we ended up building the flag for fleetd as --enable-scripts
when the plan was --script-execution
.
Looks like this happened because we had --enable-scripts
in this subtask: https://github.com/fleetdm/fleet/issues/13304
Not a big issue but we should figure out how to prevent this from happening. Took a note to chat about this during our 1:1
Confirm and celebrate: Docs are here: https://fleetdm.com/docs/using-fleet/scripts
Scripts run swiftly, In the cloud city, problems mend, Peace to admins lend.
Updated the pricing page to reflect scripts as a premium feature: https://github.com/fleetdm/fleet/pull/14167
@roperzh @noahtalerman I know this might not be the right place to comment but can you point me to the reason scripts were implemented with the 5m limitation? E.g., if I as an admin wanted to run the following script:
my understanding is the script as executed by Fleet would fail. So, I am just trying to find out more about what we believe we are up against that caused us to add that arbitrary limit. Thanks!
TLDR: We first built scripts w/ 1m timeout to prevent a confusing UX. We bumped to 5 mins because we hear about real world scenarios that needed a longer timeout (installomator). We can adjust further based on more real world scenarios.
the reason scripts were implemented with the 5m limitation?
@nonpunctual when we first built scripts, the timeout was 1m. And, the user could only run scripts on online hosts. Meaning, you ran the script and you had to sit and wait for a response.
Why the 1m timeout? We didn't want users to have to sit in front of the UI, terminal, etc. and wait for more than 1m for a response.
Then we got feedback that folks were trying to run scripts w/ installomator which often exceeded the 1m timeout. So we bumped to 5.
We can certainly continue to adjust this limit but we want real world scenarios in which 5 minutes doesn't work before we do.
some loop with an indeterminate amount of iterations each loop iteration has an intentional 1s pause on execution it turns out the loop has 301 iterations = 301s = 5m 1s
I think this example is too arbitrary. What's a script that IT admins often run that takes longer than 5 minutes? What does it do? (ex. install software) Why is it valuable.
Please feel free to bring this to feature fest!
@roperzh Thanks @noahtalerman I will look at the Feature Fest docs to add this.
Many API endpoints have built-in rate limiting. Also, they can be slow when being hit with curl or wget from a shell.
I made my example generic but not arbitrary. I have experienced this not just with Jamf APIs, but internal Apple APis, Meraki cloud, ServiceNow, etc. So,
The larger issue is not this use case.
We are diminishing the value of the script execution feature if we are hobbling what can happen in the shell. Scripts should run EXACTLY AS IF they were being executed directly on the computer by a user, period. Saying we allow admins to execute scripts should mean we allow them to execute ANY script & we should not be in the way of that.
In my opinion, maybe the concept of "waiting" has been mistakenly conflated in this feature design with "waiting for Fleet to do something".
In this case, if workflows with long time frames are intentionally designed by an admin, Fleet is not doing anything bad. It is doing what they want it to do, i.e. running their script.
Thanks @nonpunctual!
I think this deserves more discussion at product office hours.
I added a note to discuss during out next product office hours.
User story
As an IT admin or security engineer, I want to run a script once, on a specific host, so that I can remediate an issue or collect logs.
Requirements
Changes
fleetctl run-script
commandPayloadType
com.fleetdm.fleeetd.config and extract the scripting key. If this value istrue
should the subsystem turn on.--script-execution
flag in thefleetctl package
command. If this value istrue
the subsystem is turned on.UI and CLI
https://www.figma.com/file/030moybHDEurtYx52D70sF/%239583-Script-execution%3A-Run-a-script-via-CLI?type=design&node-id=2-130&mode=design