Open sharon-fdm opened 1 year ago
Hi @sharon-fdm, unfortunately, we were not able to get to this work in our 6-week timeframe. Please bring this back to Feature Fest if it's still desired. Thanks!
@zhumo Thank you. I will keep track of all those closed items and bring them to Feature Fest if/when possible.
Goal
As the developer of the fleet agent, I would like to know whether any of our installed agents have problems communicating in one of the channels while other channels still work (e.g. osquery communicates well while orbit does not)so that I could identify and solve bugs in this area.
Changes
1 - On the fleet server add a new DB table that will be key-ed with the hosts ID and will have one column for each type of communication (osquery, orbit, config, or other...). When any agent communicates to the server (any comm.) the relevant part in the server will add a timestamp for this agent in the relevant column.
2 - an additional health metric will be added called "Problematic agents"
3 - Once every X days/hours the fleet server will go over this table and check for agents that:
QA
Make sure the feature works. (possibly by running an agent with a broken channel. or any other way)
Risk assessment
LOW - Possible more load on the DB but will be spread to same rate of regular agents call in.
Manual testing steps
Testing notes