Log live query results - Githubissues

noahtalerman commented 3 years ago

Goal

As a Fleet user, I want to be able to send live query results to my configured result log destination so that I can then send the results from the log destination to a Google sheet and/or Jupyter notebooks and/or analytics tooling.

nyanshak commented 3 years ago

Specifically for my use case...

What options are included in configuring live query logs? (different logging plugins/accounts)

I would be fine with this being sent to the same fleet-configured logging destination. However, I don't currently have logs going to fleet, so it would be interesting if osquery also supported sending these logs to its configured logging_plugin as well. I imagine that's probably out of scope here, but that would be an extra nice-to-have :woman_shrugging:

Are all live queries sent to the configured log destination or is there an option to turn off logging on a per-query basis?

Preferably, I'd like to always log (1) who ran the query, (2) what query (including the query string if not a saved query), (3) which targets it ran against.

Then as far as logging the query results, I think it would be fine to have a radio box that says something like "save query results to log". I don't think I'd always need results to be logged.

noahtalerman commented 3 years ago

Your description of what you'd preferably include in the logs is very helpful. I incorrectly assumed that the actual query results were of the highest interest.

Is logging the live query campaign data (who ran it, what query, and which targets) of most interest for monitoring reasons? Using the phrase "monitoring reasons" because I can't describe these reasons in more detail.

At a higher level: why is it important to collect the live query campaign data?

nyanshak commented 3 years ago

Logging who ran a query, what query, and targets is useful for debugging issues. For example, if we noticed performance issues due to osquery on a specific host at a specific time, we could try to see if anything abnormal (live queries) were run which caused excessive cpu / memory usage. This is especially important because the watchdog controls don't apply (as I understand it) to the ad-hoc queries.

Secondly - if we had a very large number of results from a query and it caused problems with fleet itself, it would be useful to try to correlate that with live query logs.

Why is it important to collect live query campaign data?

During an investigation, an analyst might run a live query. In fleet, the result is ephemeral. It's not easy to share the results of the search. If query results are logged to our regular log stores, we can search, filter, and share them as we're used to instead of working through Fleet UI. Yes, it would be nice to have better sort & filter options after getting results in fleet's UI, but it's really a separate issue.

Additionally, without pagination / storing query results, it's easy to crash a browser tab from trying to render so many results and just lose all the data anyways.

noahtalerman commented 3 years ago

Got it. Thank you for the great response. I'm seeing 3 related but separate goals here:

As a Fleet user I want to log who ran a live query, what query was run, and which targets the query was run on so that I can correlate host performance issues with the live queries that were run.
As a Fleet user I want to log who ran a live query, what query was run, and which targets the query was run on so that I can correlate Fleet performance issues with the live queries that were run.
As a Fleet user I want to log a live query's results from Fleet so that I can search, filter, and share them in a familiar environment. Presumably, a similar or the same environment that handles the search, filtering, and sharing of logs from scheduled queries.

Planning to break these into separate GitHub issues.

fleetdm / fleet

Log live query results #366

Goal