Open dherder opened 2 months ago
I think it would be ideal if these results were simply cached as a table & that a single query could parse the results as in @dherder 's example. The product "implies" this is possible by returning filtered results in a UI table.
Still good, but less than ideal, would be a live query UI or even a search bar on the returned results of any query with exportable results for the 2nd query.
I want to be able to derive further insights from that data by transforming with SQL. I don't want to have to build separate queries that run on devices to obtain these insights.
This sounds like a really powerful feature.
Using the above example, SELECT COUNT(*) FROM "kernel_version_query_data_table" WHERE version > "23.6.0"
@dherder in this example, I think you can achieve this in a workaround way by clicking on the filter above the version
column and typing "23.6", "23.7", "23.8", ... until there are no results. This is definitely a workaround but I just want to call out that the current filtering could be used for some use cases in a pinch.
Today, is there a third-party SQL tool in which we could import the results? Can you write SQL queries in Google Sheets?
cc @nonpunctual ^^
@noahtalerman There 0 scenarios in which your workaround would be viable for prospect-salix at scale.
@noahtalerman In addition to @nonpunctual's comments, I'd like to add that the example quoted above is really just a simple example that illustrates the problem from a data perspective. A more realistic example would be a JOIN on a specific cached query result table to another cached query result "temp table". This cross-result ETL would be a huge advantage for all of our customers.
A more realistic example would be a JOIN on a specific cached query result table to another cached query result "temp table". This cross-result ETL would be a huge advantage for all of our customers.
Thanks @dherder! What is prospect-salix
trying to do exactly?
I'm trying to understand the problem they're running into today before we get into the "how" (solution).
Thanks for the use case @nonpunctual!
count how many computers are on version 14.6.1
What if I type "14.6.1" into the "version" column? That would filter results and I would see how many results are there up here:
every row in the visible "table" is counted once on a separate row (because each row represents a host)
I might be misunderstanding your comment but to clarify, each row represents a result (not a host). A host can return many results.
Although in this example, I think it will always be 1 results to 1 host. I'm guessing this is a query to get the macOS version.
I realize this example might not get at all the use cases the prospect is trying to achieve.
Assuming I'm understanding correctly that the macOS version use case is possible, are there other use cases that aren't possible? I think we want to understand the exact problem they're running into today before we get into the "how" (solution).
@mikermcneil @noahtalerman @alexmitchelliii @dherder @zwass @getvictor @lukeheath
Example:
If the data returned to Fleet could be queried, SELECT count(*)
& GROUP BY version
would sum the total number of computers on macOS version 14.6.1.
Instead, I get a count of 1
for each unique host. This is expected behavior in Fleet today. The count of 1
for each host is correct. Each 14.6.1 match is being counted on each host.
Results are filtered for the Fleet UI view, but, "DOA" for further parsing.
This is counterintuitive given how subselects & CTEs work if this query were run against an actual SQL db. Thanks.
This is what prospect-salix is asking about on this call: https://us-65885.app.gong.io/call?id=2382501646065789366&highlights=%5B%7B%22type%22%3A%22SHARE%22%2C%22from%22%3A745%2C%22to%22%3A780%7D%5D
@nonpunctual call?
@noahtalerman lets bring this through “users will expect” pass
lets bring this through “users will expect” pass
@mikermcneil sounds good. I added to the agenda for tomorrow's unpacking the "how" call.
@noahtalerman @mikermcneil From linux CPE engineer @ prospect-salix:
This first level is basically the limit of what fleet can do today:
This story is about the logical next step of aggregating that data using the previous results as a table. Immediately I see not only the progress of the upgrade, but now a new insight I didn't know to look for, I've got 1 box running a super old version which I can now dive back into the details to find/remediate:
Another currently supported mechanism for doing this kind of aggregation is via fleetctl and standard unix tools.
For example:
fleetctl query --labels 'All hosts' --query "select name || ' ' || version as version from os_version" --timeout 30s | jq '.rows[0].version' | sort | uniq -c
Zach Wasserman in your example how should the query system know that you intend your group by to be an aggregation across all the hosts rather than an aggregation on each host.
Brock Walters Because I've returned a set of results for all hosts. I now want to query the aggregate. I think that could be a check box / option, but, the assumption in all of the examples has been that you want whatever has been returned for each host to be in a new query-able table that can be queried based on the way it's being displayed in the UI
The syntax I agree is a tricky problem but I think that could be solved by:
Zach Wasserman Ah yes I agree with that approach. 2 queries where the first runs on each individual host and the second runs on the aggregated data. Potentially it could use the data that we already collect in query reports?
Benjamin Edwards just messing around with the query_results table (this is where query report data is stored.
SELECT jt.*, count(*) as count
FROM query_results,
JSON_TABLE(
query_results.data,
'$' COLUMNS (
scheme VARCHAR(50) PATH '$.scheme',
enabled VARCHAR(1) PATH '$.enabled',
handler VARCHAR(255) PATH '$.handler',
external VARCHAR(1) PATH '$.external',
protected VARCHAR(1) PATH '$.protected'
)
) AS jt
WHERE query_results.query_id = 6623 GROUP BY jt.protected;
JSON_TABLE is pretty slick (but you need to know the schema to build the resulting table).
+---------+-----+
|protected|count|
+---------+-----+
|0 |32 |
|1 |11 |
+---------+-----+
Hey @dherder I pulled this request off of feature fest b/c it doesn't meet the criteria for prioritization: https://github.com/fleetdm/fleet/pull/23184/files#diff-c99d12c3af50c0c2aca2b9ef7597c02ccfe87678291956ff0b2e83d63978ea38R370
@noahtalerman this is a request from prospect-salix
@nonpunctual is the request in an order form? See the criteria here for prioritization: https://github.com/fleetdm/fleet/pull/23184/files#diff-c99d12c3af50c0c2aca2b9ef7597c02ccfe87678291956ff0b2e83d63978ea38R370
@alexmitchelliii @phtardif1 @allenhouchins @harrisonravazzolo @dherder can you answer Noah's question above? Thanks.
prospect-salix
: Users attempting to replace Tanium will expect some way to report on certain values (TODO: what are they specifically?) that I would need the ability to average, sum, or count up a value across multiple hosts. Users in this situation, imagining this feature in Fleet, also can imagine ways that a generic reporting engine might be helpful for other IT use cases, such as digital employee experience (DEX) reporting.Problem
Today Fleet is great at returning data via live queries and scheduled queries. Scheduled queries have improved such that we are able to collect and store results in Fleet rather than just export to a logging pipeline. These are called cached query reports.
As a Security or IT admin, I would like to be able to query across these reports. For example, I want to build a query report that gets information about the kernel version of all of my devices. With the resulting data in the cached report, I want to be able to derive further insights from that data by transforming with SQL. I don't want to have to build separate queries that run on devices to obtain these insights. Using the above example,
SELECT COUNT(*) FROM "kernel_version_query_data_table" WHERE version > "23.6.0"
would allow the admin to easily count the distribution of all the hosts at a particular kernel version.What have you tried?
Today, customers leverage products like wuzah that have dashboards that sit on top of the osquery data. Additionally, customers export this data out to a logging pipeline and transform it with automation and data management (BI) layers.