chef / automate

Chef Automate provides a full suite of enterprise capabilities for maintaining continuous visibility into application, infrastructure, and security automation.
https://automate.chef.io/
Apache License 2.0
227 stars 113 forks source link

DISCOVERY: controls list #669

Closed vjeffrey closed 5 years ago

vjeffrey commented 5 years ago

User Story

In Automate today, we provide users with a list of all nodes that were scanned over the day and a list of all profiles that were part of those node scans. Now it's time to go one step further and provide them with a list of all controls that were executed as part of those node scans. This is really just an accumulation of all the controls in all the profiles that were part of the node scans.

This issue covers the discovery work for the new endpoint/list view.

POC Criteria

Related Resources

https://chef.invisionapp.com/share/MHS3Z6C35RA#/367706842_NIST-Compliance_-_Controls

Definition of Done

have experimented and have a clear, proven workable, performant path to providing users with a controls list

rickmarry commented 5 years ago

I've looked into our options for pagination with this (controllist) api. It turns out that we cannot do it. Here's why:

Facts: 1 - We store our control data in "nested" objects in our Elasticsearch time-series indices. 2 - We would need to use aggregation in order to get unique controls out of these nested objects. 3 - The only way we may support pagination with aggregation is to use either "composite aggregation" or "partition" on terms aggregation. 4 - We cannot use composite aggregation on nested aggregations (see fact #1) 5 - We cannot sort on partitions.

Even if we could use composite aggs, we would not be able to have true pagination. True pagination allows us to specify page_num and size, which allows use to jump to a random spot in a large list. Composite aggs allow us to use the "after" field in order to give us a (very) limited version of pagination, relative to that of true pagination. The fact remains however, that we cannot even use this limited pagination but figured I'd call sour grapes on it here.

Not being able to sort on partitions, it's easy to see why partitions are not a viable choice so no need to look further.

Where does that leave us? After thinking about this more, it occurred to me that because we use such a large number of controls, even a properly sorted, paginated list would be overwhelming and not very delightful. It makes sense that we should be doing some significant narrowing of our our control list via filtering of both non control fields, as well as the control names themselves. This narrowing will yield a much smaller list of controls. I suspect that this is what users would naturally do just to avoid wading thru seemingly endless lists, in search of their one specific item that they they seek to click on..

thoughts?

susanev commented 5 years ago

i def might not have enough context here, but could we require them to select one or more profiles (or similar interaction) before displaying a controls list?

vjeffrey commented 5 years ago

i think the problem with requiring them to select one or more profiles is that we also want to implement search and filter by control tags (https://github.com/chef/automate/issues/668), which is a way of grouping controls across multiple profiles.

we could possibly not show it until they filter on a control tag, but that feels weird somehow? not sure. also i don't have any other ideas. :/

rickmarry commented 5 years ago

I think selecting one or more profiles to narrow down the choices is a good example of the narrowing mentioned above. More surface-oriented filtration like, for example, on environment or platform, will also serve (less directly) to narrow it as it will reduce the number of nodes being searched across. Any filtration will serve to reduce the numbers but, short of putting in actual controls themselves as filters, or at least wildcarding the controls, the numbers of controls, even reduced down to profile can get into the thousands, which will still be overwhelming to the UI user to wade through.

The options may come down to: 1 - Returning the first 100 (or whatever number) search result items and, with that response, also include the number of controls they are not seeing due to response size constraints (this will allow the user to at least be aware that there are more to be had). We would also want to make it known that, by using filters to narrow the choices, they will be able to see the complete list. 2 - Remember, the limitation here is caused by Elasticsearch's inability to properly paginate on aggregations. This is not the only place in the app where this limitation has been a major shortcoming. We may want to consider putting some of this data into postgres, if postgres can handle these requirements. ES will still have its place but, when it comes to this sort of thing..good old SQL would be the go-to.

vjeffrey commented 5 years ago

we've decided to move forward with the non-paginated list -- closing this issue in favor of the issue that discusses actually working the issue