fleetdm / fleet

Open-source platform for IT, security, and infrastructure teams. (Linux, macOS, Chrome, Windows, cloud, data center)
https://fleetdm.com
Other
2.91k stars 405 forks source link

Scheduled maintenance fails to book meeting if first meeting does not remediate failing policy #19596

Closed lukeheath closed 2 months ago

lukeheath commented 2 months ago

Fleet version: <!-- Copy this from the "My account" page in the Fleet UI, or run fleetctl --version --> 4.50.2

Web browser and operating system: Chrome


💥  Actual behavior

I have a policy that will always fail my host.

The calendar events automation is enabled for this policy.

When I first fail the policy, the scheduled maintenance event is created as expected. If I delete it or move it to the past, it rescheduled as expected. But, if I let the scheduled maintenance event start, it will never re-book scheduled maintenance for my host for this failing policy. The only way to trigger it is to update the policy, fetch my host so it passing, then update the policy again and refetch my host so it is failing. As soon as I do that, it recognizes the newly failed policy and creates the event. But it does not recognize existing failed policies that have already had scheduled events.

The expected behavior is that if scheduled maintenance fails to resolve a failing policy, another scheduled maintenance event is created at the first check after the first event is complete.

🧑‍💻  Steps to reproduce

  1. Create policy your host will always fail.
  2. Enable calendar event automations.
  3. Wait for the scheduled maintenance event to appear, then reschedule it to 1 or 2 minutes in the future.
  4. Wait for the event to start and end.
  5. Note that the scheduled maintenance is not re-booked for this policy, even though it is still failing.

🕯️ More info (optional)

N/A

lukeheath commented 2 months ago

@getvictor Since you're working on the calendar cron env var ticket I'm assigning this straight to you. I'll be giving any calendar bugs a P2 because there is an associated urgency for this feature to demo well in the near future.

This bug was hard to notice with the 30 minute cron, but I hardcoded it back to 30 seconds last night on dogfood and it's a bit easier to chaos engineer the calendar feature that way.

cc @sharon-fdm

lukeheath commented 2 months ago

@getvictor informed me this is intended behavior. We wait 24 hours after the event before re-booking for the same policy to give the user (and IT if necessary) time to complete necessary processes.

fleet-release commented 2 months ago

In the cloud city's heart, Meetings rebooked, errors thwart, A fresh start, smart art.