Open zhumo opened 1 year ago
Interval change here and in config?
Hey @ksatter. We're deprioritizing this issue as we won't be able to deliver it in the next 6 weeks. Please bring this back to the PFR call if it surfaces again so we can re-prioritize
Zay: High/medium priority
Air guitar this one.
@zayhanlon heads up, we brought this into the upcoming design sprint as an air guitar.
@zayhanlon this didn't make it into the sprint, bringing back to Feature Fest
@noahtalerman need to chat w/ customer-schur and customer-ufa to better understand the problem/
@zayhanlon, heads up, this didn't make the 3 week drafting timeline so we're removing it from the drafting board. Bringing back to feature fest.
Heads up @zayhanlon this request was discussed during feature fest last week and didn't make it into the current design sprint.
@noahtalerman @marko-lisica would like to discuss this one if you get a chance to see why it was deprioritized. Too hard? Not enough viable use cases?
@nonpunctual sounds good! Can you please add this as an agenda item to the next product office hours?
Hey @ksatter and @nonpunctual, heads up, we didn't have room to take this one in the current design sprint (4.48).
@nonpunctual is that Slack thread regarding the customer that runs lives queries to detect device health?
If so, I think they want get fresh results every time the user tries to log in to Okta. If this is the case, and using live queries for this is painful, then I don't know if recalculating policies at a faster interval is the right solution.
Instead, maybe the device health API should fetch fresh results each time.
@noahtalerman I am not sure. There are 6 customers attached to this issue all with various reasons for wanting the ability to set an "execution frequency" for policies. This is mostly because the customers know that running large queries on the same interval as everything else is potentially "painful" & that pain could be relieved by running their large queries less frequently.
This need is tied to having fresh data in Fleet instead of stopping after collecting data from the 1st 1000 Hosts Fleet sees. https://github.com/fleetdm/fleet/issues/397
That issue has been open for 2y.
@noahtalerman @alexmitchelliii @ksatter I could be wrong in how I am understanding these customer requests but I don't think this is about getting faster results in Fleet.
It's about setting non-critical policies that collect "static" data to run on a slower interval which will reserve space for the cached data of critical policies that collect "dynamic" data.
Currently, workarounds are to manually run the query policy, run a live query, or have all policies on a shorter interval
Admins want Host data to be up-to-date & easily accessible in the Fleet UI. That is the relationship between this ticket & #397 Expansion of Host Vitals. Customer-ufa said the UI for looking at the info for an individual host is ok but it's not useful for their Help Desk Fleet UI users because Fleet only caches data for the 1st 1000 hosts seen.
If something like a separate cadence for critical & non-critical policies were implemented the data for the critical policies could be fresh & could be limited to, e.g., 25 or 50 critical policies defined by the Fleet admin which would serve as a cap on the amount of cached data.
Customer-ufa said the UI for looking at the info for an individual host is ok but it's not useful for their Help Desk Fleet UI users because Fleet only caches data for the 1st 1000 hosts seen.
@nonpunctual this is great feedback. Thanks.
I think we're getting the cached query results and policy features mixed up here.
With the cached query results (what the customer is interested in), the Fleet admin can already set the frequency on a per query basis. Some queries can run every 5 minutes and other queries can run every hour.
This way, the Fleet admin can protect the performance of their devices (only run intensive queries every so often).
It's about setting non-critical policies that collect "static" data to run on a slower interval which will reserve space for the cached data of critical policies that collect "dynamic" data.
If I'm understand the above correctly, we just want Fleet to be able to get more data (more results) faster w/o having to worry about filling up the Fleet DB.
Ideally, the user doesn't have to worry about the filling up the Fleet DB part. I think we should try hard to make that Fleet's job. The customer can collect as much data as they need and as frequently as they need it.
Not saying I am not confused. The thing I am trying to solve for is this:
Fleet caches data for the 1st 1000 Hosts it sees & stops caching after that.
What would be a lot more useful based on experience & lots of customer feedback is:
Fleet caches data for the most recently seen 1000 or 2000 or 5000 Hosts & lets the old data fall off the edge, like a Time Machine backup, e.g.
Hey @pintomi1989, any new info on the problem we're trying to solve w/ this one? And for what customer?
Hey @noahtalerman,
This was brought up as a nice to have feature by customer-starchik during this week's meeting. Not a current blocker, but would be awesome to be nice to see critical policies updated on a more prioritized cadence.
Goal
Changes
This issue's estimation includes completing:
Context