Open collinmcneese opened 3 years ago
This also happens when trying to use the top level name
of the profile too (as documented in the Automate API) rather than using the filter of profile_name
. Instead of only the ssh-baseline
results coming back (the name from inspec.yml
) all profile results are returned:
curl -sk -X POST -H "api-token: ${AUTOMATE_API_TOKEN}" -d '{"name": "ssh-baseline", "type": "json"}' "${AUTOMATE_API_BASE_URL}/compliance/reporting/export" | jq '.[].profiles' | jq '.[]' | jq .full
"DevSec SSH Baseline, v2.3.2"
"CIS Amazon Linux 2 Benchmark Level 1, v1.0.0-6"
"DevSec SSH Baseline, v2.3.2"
"CIS Amazon Linux 2 Benchmark Level 1, v1.0.0-6"
"DevSec SSH Baseline, v2.3.2"
"CIS Amazon Linux 2 Benchmark Level 1, v1.0.0-6"
"DevSec SSH Baseline, v2.3.2"
"CIS Amazon Linux 2 Benchmark Level 1, v1.0.0-6"
The filter is working but it is not properly documented: To filter by profile name we need to pass the filter as - {"type": "profile_name", "values":["
Please try out by passing a query like this:
{ "filters": [{"type":"profile_name", "values": ["devsec linux security baseline"]},{"type": "control", "values":["os-01"]}],
"type": "json"
}
Hi @kalroy -- Yep if you add the control as a filter in then it only reports back that control, however if you do not add a control filter and only add a profile filter, it returns all controls from all profiles rather than only the controls from the selected profile
example:
{"filters": [{"type":"profile_name", "values": ["devsec linux security baseline"]}], "type": "json"}
If you have multiple profiles with data, running that query will return back all controls for all profiles, ignoring the profile_name
filter value
Additionally -- if you pass a profile_id
then that works as well. Just no combination of text within profile_name
seems to work.
The below is passing the profile_id for linux-baseline
and it returns back only the controls for this profile across all nodes, as expected.
{"filters": [{"type":"profile_id", "values": ["477f53f8f6867a0f1abe7c199a569d00c9d38d8f2a8b85dbb9cc361ca435a2b6"]}], "type": "json"}
But the profiles are from the same report? So @collinmcneese : There are two levels of filters here:
I have to load a good amount of data to test this out, but code says me this.
Having said that, I agree we can improve the API to filter out the content too by the name. But I need to check the historical reason behind that and may be performance repercussions.
But the profiles are from the same report? So @collinmcneese : There are two levels of filters here:
- profile_name can filter the reports. So if you have 10 reports where one of them has this profile_name, we will first filter out that report.
- using profile_id and control are for depth filters which can filter out content to be shown in that report.
I have to load a good amount of data to test this out, but code says me this.
Yeah I get what you mean here "I want to grab all of my reports that include this profile" -- Results are the full reports that contain the named profile_name
Then there is the other use case of "I only want the data for this profile_name", and for that it looks like we have to use the profile_id
which involves an extra call to fetch the profile_id
first -- perhaps that is the answer though so that it does not break the existing usage of getting all report data that contains a profile_name
. hmm.
Yeah. Trouble is that profile_id is the SHA of the profile which I cannot see in the inspec.yml
file
In order to narrow the data to only one profile, we must use profile_id
. The reason that profile_name
doesn't suffice here is that we may have different versions of profile_name
. Filtering by profile_id
guarantees that we are querying with exactly one profile. We require this singularity of profile because we don't do the various stats aggregations at a granular level when computing stats, as doing so would be very slow, due to the number of reports in the system.
Similarly, for control level narrowing of stats, we require that exactly one control be included in the filter and that control must be a child of one single profile (again, included as profile_id
and not profile_name
)
in summary deep filtering works as follows:
Rules for profile depth
:
profile_id
Rules for control depth
:
profile_id
*N.B. profile and control depth rules differ only by the second rule
Anything not conforming to the rules above will yield report depth
@collinmcneese We gave merged a documentation PR to make the deep filtering concept easy to understand. https://github.com/chef/automate/pull/5266
I guess it resolves this issue.
That updated document shows this:
Chef Automate saves computational time and storage space by calculating compliance reporting statistics at the aggregate level. Deep filtering uses the
profile_id
attribute to drill down to the granular level of your compliance status. In contrast, filtering with theprofile_name
attribute instead ofprofile_id
creates a report for every version ofprofile_name
in your infrastructure.
This does not seem to be accurate unless a change was made -- currently specifying a profile_name
returns all controls from all profiles in the system, not just all versions of the profile_name
specified, which was the reason for this issue creation.
Describe the bug
When attempting to use the API to perform a compliance report export by a named
profile_name
, results of ALL profiles are returned, not only the results of the specific profile which was requested.The API documentation indicates that
profile_name
is a valid export filter so it is expected to work to filter results of the export to a single named profile: https://docs.chef.io/automate/api/#operation/ExportTo Reproduce
Steps to reproduce the behavior:
export AUTOMATE_API_TOKEN='my-api-token-from-automate'
export AUTOMATE_API_BASE_URL='https://automate-fqdn/api/v0'
curl -sk -X POST -H "api-token: ${AUTOMATE_API_TOKEN}" -d '{"type": "profile", "text": "*"}' "${AUTOMATE_API_BASE_URL}/compliance/reporting/suggestions" | jq
DevSec SSH Baseline
, as returned from suggestions. Results of query include all available profiles on the system instead of only data for the filtered profile_name (this example output has 4 nodes with two profiles each having data)curl -sk -X POST -H "api-token: ${AUTOMATE_API_TOKEN}" -H "Content-Type: application/json" -d '{"type": "json", "filters": [{"type": "profile_name", "values": ["DevSec SSH Baseline"]}]}' "${AUTOMATE_API_BASE_URL}/compliance/reporting/export" | jq '.[].profiles' | jq '.[]' | jq .full
name
from inspec.yml of the profile,ssh-baseline
, no data will be returned.curl -sk -X POST -H "api-token: ${AUTOMATE_API_TOKEN}" -H "Content-Type: application/json" -d '{"type": "json", "filters": [{"type": "profile_name", "values": ["ssh-baseline"]}]}' "${AUTOMATE_API_BASE_URL}/compliance/reporting/export" | jq '.[].profiles' | jq '.[]' | jq .full
Expected behavior
When hitting the
compliance/reporting/export
endpoint using a filter of typeprofile_name
, it is expected that the result stream will only include results for the chosen profile.Versions (please complete the following information):
Aha! Link: https://chef.aha.io/epics/SH-E-526