Schedule scripts - Githubissues

noahtalerman commented 11 months ago

Goal

User story
As a security engineer,
I want to run scripts on a schedule (ex. once per day) against many hosts
so that I can use scripts to enforce configuration at scale (40-something different companies/environments) 36 different OS versions.

Changes

Product

[ ] UI changes: TODO
[ ] CLI usage changes: TODO
[ ] REST API changes: TODO
[ ] Permissions changes: TODO
[ ] Outdated documentation changes: TODO
[ ] Changes to paid features or tiers: TODO
[ ] Scalability testing: TODO

Engineering

[ ] Database schema migrations: TODO

ℹ️ Please read this issue carefully and understand it. Pay special attention to UI wireframes, especially "dev notes".

Context

Requestor(s): _____

QA

Risk assessment

Requires load testing: TODO
Risk level: Low / High TODO
Risk description: TODO

Manual testing steps

Step 1
Step 2
Step 3

Testing notes

Confirmation

[ ] Engineer (@____): Added comment to user story confirming succesful completion of QA.
[ ] QA (@____): Added comment to user story confirming succesful completion of QA.

dherder commented 8 months ago

It would be great to be able to also differentiate when a script could install (ie: during Setup Assistant for a one time execution), or after user creation on a recurring basis (2 separate use cases).

noahtalerman commented 7 months ago

More use cases for running a script at initial enrollment (DEP) compared to this story (every 2 minutes).

noahtalerman commented 7 months ago

Hey @dherder, heads up, we discussed this issue during feature fest.

We decided not to draft this in the current design sprint (4.49).

Removing it from the feature fest board.

noahtalerman commented 6 months ago

Hey @pintomi1989, thanks for bringing this to feature fest. Are there any new, specific use cases that we've heard from customers?

nonpunctual commented 6 months ago

@noahtalerman customer-flavia mentioned on a call 20240416 that this feature would be useful for managing devices, for automated remediations. Not only timing (e.g., run this script every Monday at 5p) but additional triggers (e.g. run this script on an event, on a match to a label attribute, on a policy failure, etc.)

bolaussen commented 4 months ago

I would really appreciate this feature to be added to Fleetdm, without it I would have to re-engineer my onboarding workflow. In previous testing, using a preinstall script in my bootstrap package didn't run like it was supposed to. I am currently triggering the onboarding script using an "on enrollment" check box. TIA!

kirkog86 commented 3 months ago

Same here, the scripts scheduler is an important part of MDM, things like "run once" or every 30 /60 minutes can make the admin's life much easier.

TheDevMinerTV commented 2 months ago

I use Fleet to administrate my family's devices and I want to automatically turn off devices after 10pm that aren't needed (laptops, gaming PCs, etc.). I think for now I'll create individual scripts for Windows and Linux that set up the task in SystemD / Task Scheduler but having a nice UI for this would be a plus.

mostlikelee commented 1 month ago

How important is timing? I'll outline a few scenarios: 1 - Run a script every X minutes (don't care about a specific time, more about the interval) 2 - Run a script at 9am. How important is the 9am start time? Ok for the script to run at 9:02? 3 - In both scenarios above, what's the desired behavior when the host is offline and comes online?

It's hard to talk about schedules without talking about business requirements such as change/maintenance windows. ie. "only run this script between 9-10a on Saturday", or inversely "don't run this script on weekends".

What kinds of timing options would be most useful for the first iteration of this feature?

Auditing: what kind of info/reporting would be most important to you? Writing the result of every script run to the activity log could be very verbose.

harrisonravazzolo commented 1 month ago

@mostlikelee great questions. Glad you raised them.

I could find out from this particular prospect but this use case is coming from a large financial firm that runs scripts on a regular schedule, say 1h, and then the data is collected and remediation on the output if needed is executed. Things like configuration check, etc.

However, it does beg the question and a bit of digging on my side with the prospect because I wonder how much of what they are actually checking for could be answered with simple osquery and policies.

For example, they talked about cis benchmarks, an answer you could get in fleet without the need for executing a script so often.

mostlikelee commented 1 month ago

It seems like the primary goal here is to ensure a device maintains a desired state, so if there is some configuration drift a script is there to move it back into the desired state. Yes we can now do this with policies/scripts automation, but I'm curious if the time period between policy evaluations are too long to meet security requirements (ie. Firewall cannot be disabled for more than X minutes at a time).

Another item to consider is offline remediation: how important is it that a device runs this check/remediation when it cannot communicate with Fleet?

nonpunctual commented 1 month ago

I definitely think a lot of remediation workflows could be replaced with Policies failures running scripts / installing packages. The "run once a day" "run once a week" "run at checkin" options come from other solutions & are really a band aid on actual device state management which the Fleet Policy engine is maybe better at in lots of ways. @mostlikelee @harrisonravazzolo

bolaussen commented 1 month ago

@nonpunctual makes a really good point. Thinking about this again, having a configuration script run at enrollment is really the only event in my mind that would be needed outside of just running scripts/pkgs as a remediation step for a failed policy.

dherder commented 1 month ago

@mostlikelee you are right on in your assessment re: configuration drift. In many cases, a device may not have the ability to check in with the Fleet server in order for a policy to trigger the remediation. So, an admin schedules the execution in a repeated fashion in order to have the best chance of catching the device when it comes online, or to continually enforce that execution. #19877 was logged in order to solve for this. IMHO, if we had that, scheduling policy evaluations for repeated execution would be less important.

nonpunctual commented 1 month ago

Also, maybe it's obvious, but just want to reinforce that the webhooks emitted from Activity feed (which should include enrollment events) were added for exactly this kind of workflow (eg, enrollment happened, do something else.)

I do think there probably are legitimate use cases for schedules though (just as cron jobs / clocks are an important part of what goes on behind the scenes in Fleet...) Reissuing certs could be scheduled. Prompting a user for admin auth or creds for a service after a TTL has expired, etc.