chef / automate

Chef Automate provides a full suite of enterprise capabilities for maintaining continuous visibility into application, infrastructure, and security automation.
https://automate.chef.io/
Apache License 2.0
227 stars 113 forks source link

Compliance Report Export API not providing expected results with profile_name filter #5090

Open collinmcneese opened 3 years ago

collinmcneese commented 3 years ago

Describe the bug

When attempting to use the API to perform a compliance report export by a named profile_name, results of ALL profiles are returned, not only the results of the specific profile which was requested.

The API documentation indicates that profile_name is a valid export filter so it is expected to work to filter results of the export to a single named profile: https://docs.chef.io/automate/api/#operation/Export

To Reproduce

Steps to reproduce the behavior:

export AUTOMATE_API_TOKEN='my-api-token-from-automate' export AUTOMATE_API_BASE_URL='https://automate-fqdn/api/v0'

curl -sk -X POST -H "api-token: ${AUTOMATE_API_TOKEN}" -d '{"type": "profile", "text": "*"}' "${AUTOMATE_API_BASE_URL}/compliance/reporting/suggestions" | jq

{
  "suggestions": [
    {
      "text": "CIS Amazon Linux 2 Benchmark Level 1",
      "id": "46eb1c31e44c60fcaab65c18a7ec3c3c723b85c99ff33b2ecfe62c0b6903b176",
      "score": 1,
      "version": "1.0.0-6"
    },
    {
      "text": "DevSec SSH Baseline",
      "id": "e70e96f35dbfb21d5ea77ba4746c63e8e0ce925d9789f75019b3662f6b2d8507",
      "score": 1,
      "version": "2.3.2"
    }
  ]
}

curl -sk -X POST -H "api-token: ${AUTOMATE_API_TOKEN}" -H "Content-Type: application/json" -d '{"type": "json", "filters": [{"type": "profile_name", "values": ["DevSec SSH Baseline"]}]}' "${AUTOMATE_API_BASE_URL}/compliance/reporting/export" | jq '.[].profiles' | jq '.[]' | jq .full

"DevSec SSH Baseline, v2.3.2"
"CIS Amazon Linux 2 Benchmark Level 1, v1.0.0-6"
"DevSec SSH Baseline, v2.3.2"
"CIS Amazon Linux 2 Benchmark Level 1, v1.0.0-6"
"DevSec SSH Baseline, v2.3.2"
"CIS Amazon Linux 2 Benchmark Level 1, v1.0.0-6"
"DevSec SSH Baseline, v2.3.2"
"CIS Amazon Linux 2 Benchmark Level 1, v1.0.0-6"

curl -sk -X POST -H "api-token: ${AUTOMATE_API_TOKEN}" -H "Content-Type: application/json" -d '{"type": "json", "filters": [{"type": "profile_name", "values": ["ssh-baseline"]}]}' "${AUTOMATE_API_BASE_URL}/compliance/reporting/export" | jq '.[].profiles' | jq '.[]' | jq .full

(no return data)

Expected behavior

When hitting the compliance/reporting/export endpoint using a filter of type profile_name, it is expected that the result stream will only include results for the chosen profile.

Versions (please complete the following information):

Aha! Link: https://chef.aha.io/epics/SH-E-526

collinmcneese commented 3 years ago

This also happens when trying to use the top level name of the profile too (as documented in the Automate API) rather than using the filter of profile_name. Instead of only the ssh-baseline results coming back (the name from inspec.yml) all profile results are returned:

curl -sk -X POST -H "api-token: ${AUTOMATE_API_TOKEN}" -d '{"name": "ssh-baseline", "type": "json"}' "${AUTOMATE_API_BASE_URL}/compliance/reporting/export" | jq '.[].profiles' | jq '.[]' | jq .full

"DevSec SSH Baseline, v2.3.2"
"CIS Amazon Linux 2 Benchmark Level 1, v1.0.0-6"
"DevSec SSH Baseline, v2.3.2"
"CIS Amazon Linux 2 Benchmark Level 1, v1.0.0-6"
"DevSec SSH Baseline, v2.3.2"
"CIS Amazon Linux 2 Benchmark Level 1, v1.0.0-6"
"DevSec SSH Baseline, v2.3.2"
"CIS Amazon Linux 2 Benchmark Level 1, v1.0.0-6"
kalroy commented 3 years ago

The filter is working but it is not properly documented: To filter by profile name we need to pass the filter as - {"type": "profile_name", "values":[""] Further optimization can be done if you pass the SHA of the profile which I do not believe is available to the user as it's computed internally.</p> <p>Please try out by passing a query like this:</p> <pre><code>{ "filters": [{"type":"profile_name", "values": ["devsec linux security baseline"]},{"type": "control", "values":["os-01"]}], "type": "json" }</code></pre> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/collinmcneese"><img src="https://avatars.githubusercontent.com/u/8185808?v=4" />collinmcneese</a> commented <strong> 3 years ago</strong> </div> <div class="markdown-body"> <p>Hi @kalroy -- Yep if you add the control as a filter in then it only reports back that control, however if you do not add a control filter and only add a profile filter, it returns all controls from all profiles rather than only the controls from the selected profile</p> <p>example:</p> <pre><code>{"filters": [{"type":"profile_name", "values": ["devsec linux security baseline"]}], "type": "json"}</code></pre> <p>If you have multiple profiles with data, running that query will return back all controls for all profiles, ignoring the <code>profile_name</code> filter value</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/collinmcneese"><img src="https://avatars.githubusercontent.com/u/8185808?v=4" />collinmcneese</a> commented <strong> 3 years ago</strong> </div> <div class="markdown-body"> <p>Additionally -- if you pass a <code>profile_id</code> then that works as well. Just no combination of text within <code>profile_name</code> seems to work. The below is passing the profile_id for <code>linux-baseline</code> and it returns back only the controls for this profile across all nodes, as expected.</p> <pre><code>{"filters": [{"type":"profile_id", "values": ["477f53f8f6867a0f1abe7c199a569d00c9d38d8f2a8b85dbb9cc361ca435a2b6"]}], "type": "json"}</code></pre> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/kalroy"><img src="https://avatars.githubusercontent.com/u/7055118?v=4" />kalroy</a> commented <strong> 3 years ago</strong> </div> <div class="markdown-body"> <p>But the profiles are from the same report? So @collinmcneese : There are two levels of filters here:</p> <ol> <li>profile_name can filter the reports. So if you have 10 reports where one of them has this profile_name, we will first filter out that report.</li> <li>using profile_id and control are for depth filters which can filter out content to be shown in that report.</li> </ol> <p>I have to load a good amount of data to test this out, but code says me this.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/kalroy"><img src="https://avatars.githubusercontent.com/u/7055118?v=4" />kalroy</a> commented <strong> 3 years ago</strong> </div> <div class="markdown-body"> <p>Having said that, I agree we can improve the API to filter out the content too by the name. But I need to check the historical reason behind that and may be performance repercussions. </p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/collinmcneese"><img src="https://avatars.githubusercontent.com/u/8185808?v=4" />collinmcneese</a> commented <strong> 3 years ago</strong> </div> <div class="markdown-body"> <blockquote> <p>But the profiles are from the same report? So @collinmcneese : There are two levels of filters here:</p> <ol> <li>profile_name can filter the reports. So if you have 10 reports where one of them has this profile_name, we will first filter out that report.</li> <li>using profile_id and control are for depth filters which can filter out content to be shown in that report.</li> </ol> <p>I have to load a good amount of data to test this out, but code says me this.</p> </blockquote> <p>Yeah I get what you mean here "I want to grab all of my reports that include this profile" -- Results are the full reports that contain the named profile_name</p> <p>Then there is the other use case of "I only want the data for this profile_name", and for that it looks like we have to use the <code>profile_id</code> which involves an extra call to fetch the <code>profile_id</code> first -- perhaps that is the answer though so that it does not break the existing usage of getting all report data that contains a <code>profile_name</code>. hmm.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/kalroy"><img src="https://avatars.githubusercontent.com/u/7055118?v=4" />kalroy</a> commented <strong> 3 years ago</strong> </div> <div class="markdown-body"> <p>Yeah. Trouble is that profile_id is the SHA of the profile which I cannot see in the <code>inspec.yml</code> file</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/rickmarry"><img src="https://avatars.githubusercontent.com/u/3647634?v=4" />rickmarry</a> commented <strong> 3 years ago</strong> </div> <div class="markdown-body"> <p>In order to narrow the data to only one profile, we must use <code>profile_id</code>. The reason that <code>profile_name</code> doesn't suffice here is that we may have different versions of <code>profile_name</code>. Filtering by <code>profile_id</code> guarantees that we are querying with exactly one profile. We require this singularity of profile because we don't do the various stats aggregations at a granular level when computing stats, as doing so would be very slow, due to the number of reports in the system. </p> <p>Similarly, for control level narrowing of stats, we require that exactly one control be included in the filter and that control must be a child of one single profile (again, included as <code>profile_id</code> and not <code>profile_name</code>)</p> <p><strong>in summary deep filtering works as follows:</strong></p> <p>Rules for <code>profile depth</code>: </p> <ol> <li>exactly one profile in the filter, specified as <code>profile_id</code></li> <li>no controls allowed in the filter</li> <li>any of the other supported reporting filters may also be included without any constraints on their respective quantities</li> </ol> <p>Rules for <code>control depth</code>:</p> <ol> <li>exactly one profile in the filter, specified as <code>profile_id</code></li> <li>exactly one control in the filter that <strong>must</strong> be a child of the specified profile in rule 1</li> <li>any of the other supported reporting filters may also be included without any constraints on their respective quantities</li> </ol> <p>*<strong>N.B. profile and control depth rules differ only by the second rule</strong></p> <p>Anything not conforming to the rules above will yield <code>report depth</code></p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/kalroy"><img src="https://avatars.githubusercontent.com/u/7055118?v=4" />kalroy</a> commented <strong> 3 years ago</strong> </div> <div class="markdown-body"> <p>@collinmcneese We gave merged a documentation PR to make the deep filtering concept easy to understand. <a href="https://github.com/chef/automate/pull/5266">https://github.com/chef/automate/pull/5266</a> </p> <p>I guess it resolves this issue.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/collinmcneese"><img src="https://avatars.githubusercontent.com/u/8185808?v=4" />collinmcneese</a> commented <strong> 3 years ago</strong> </div> <div class="markdown-body"> <p>That updated document shows this:</p> <blockquote> <p>Chef Automate saves computational time and storage space by calculating compliance reporting statistics at the aggregate level. Deep filtering uses the <code>profile_id</code> attribute to drill down to the granular level of your compliance status. In contrast, filtering with the <code>profile_name</code> attribute instead of <code>profile_id</code> creates a report for every version of <code>profile_name</code> in your infrastructure. </p> </blockquote> <p>This does not seem to be accurate unless a change was made -- currently specifying a <code>profile_name</code> returns all controls from all profiles in the system, not just all versions of the <code>profile_name</code> specified, which was the reason for this issue creation. </p> </div> </div> <div class="page-bar-simple"> </div> <div class="footer"> <ul class="body"> <li>© <script> document.write(new Date().getFullYear()) </script> Githubissues.</li> <li>Githubissues is a development platform for aggregating issues.</li> </ul> </div> <script src="https://cdn.jsdelivr.net/npm/jquery@3.5.1/dist/jquery.min.js"></script> <script src="/githubissues/assets/js.js"></script> <script src="/githubissues/assets/markdown.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/highlight.min.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/languages/go.min.js"></script> <script> hljs.highlightAll(); </script> </body> </html>