Closed lukeheath closed 3 months ago
@lukeheath thanks for tracking this.
Once per week, every two weeks, every three, or last Tuesday (four weeks).
If these were the options, which do you think Fleet would choose when dogfooding?
@noahtalerman I would choose one week. Reasons:
The only way we can confidently remediate within 15 days is to schedule maintenance windows weekly.
@spokanemac We welcome your input! Do you agree with my thoughts above?
@lukeheath Yes, agreed on selecting one week as the interval.
@noahtalerman I would add that with weekly maintenance windows, we have the opportunity for two windows on a user calendar to remediate over 15 days, with a few days to intervene manually.
This also helps account for potential OOO situations where the host may be offline for a week.
Hey @lukeheath, @Drew-P-drawers, and @spokanemac heads up that we shipped this maintenance windows improvement.
It looks like have an article about the feature here but there's no mention of the old timing (every 3rd Tuesday).
Are there any other guides/articles that need to be updated?
TODO @noahtalerman:
@noahtalerman No other guides at this time. My dogfooding article is still a WIP.
Update the maintenance windows diagram in Figma here so it says every week instead of every 3rd Tuesday
@spokanemac and @lukeheath, instead of updating the existing flow chart, I created a v2 of the flow chart and link to it in this issues description.
Here's my understanding of the behavior after shipping story:
@getvictor is that accurate? If that's right can you please close this issue? Thanks :)
Should be:
If it’s Tuesday and it’s past the last slot, schedule the event for the next business day.
If the webhook already fired but policy is still failing, schedule the event for the next Tuesday. Grace period of 1 day after the webhook fires before scheduling another calendar event.
But what should happen if host was offline during the event? In that case, we try to reschedule the event for the same day. This is how we end up with an event every hour.
@getvictor thanks! I updated the behavior. Please let me know if that looks right:
But what should happen if host was offline during the event?
I think we decided to exit instead of scheduling more events for that day. From the flowchart in Figma here:
@getvictor giving you an extra ping^ :)
I think we decided to exit instead of scheduling more events for that day. From the flowchart in Figma here
Did we change this behavior as part of a story I'm forgetting? (I definitely could be). If not , then I think we can track a bug for this.
@getvictor giving you an extra ping^ :)
I think we decided to exit instead of scheduling more events for that day. From the flowchart in Figma here
Did we change this behavior as part of a story I'm forgetting? (I definitely could be). If not , then I think we can track a bug for this.
We do exit. But 5 minutes later we enter this flowchart again with another cron run.
We do exit. But 5 minutes later we enter this flowchart again with another cron run.
Ah, makes sense. I think that's ok for now.
I updated the flowchart here to make sure that this it's documented somewhere.
FYI @sharon-fdm, because you're working on the guide for scheduled maintenance. I think it makes sense to call this behavior out in the guide (along with a summary of the flowchart).
Weekly window comes, Vulnerabilities addressed, Safe in the cloud's arms.
@noahtalerman your flowchart link in this msg is broken.
Thanks for the heads up @sharon-fdm! I fixed the link: https://www.figma.com/design/AeCMzgaSqN4DXzTrKxvdYh/%2319031-Maintenance-windows-every-week?node-id=2-130&t=QQNKwOc7xgnvqx1v-1
Goal
Context
Changes
Product
Engineering
QA
Risk assessment
Manual testing steps
Testing notes
Confirmation