Mock Infra Client service for Automate/Chef Desktop demo and acceptance data

vjeffrey commented 4 years ago

Create a service that can be used in Automate 2 to send scheduled data to Automate mimicing the data that would be send from a client.

When accepting features in Automate 2 ( https://a2-acceptance.cd.chef.co/) it is difficult to appreciate the feature without realistic data. Sometimes we have generated data but it is typically repeating and low value (e.g. 100 nodes with names node1...node100).
When demoing Automate 2 + Chef Desktop to customers the same challenge applies. The customer needs to see realistic data in the demo.

There are 4 types of reports currently sent to Automate. We're interested in only the first one for now.

infra client data sent to the data collector
actions sent from the infra server
1. data collector action schema
inspec reports sent by the audit cookbook
inspec reports from automate scan jobs

The run start and run end schema messages generated by the Infra Client can be saved to disk as part of an Infra Client run using the data_collector_output_locations Chef::Config setting.

The service should take a collection of run start and run end schema messages as files on disk and send them to a configurable Automate data collector URL with a configurable data collector token/secret. The goal is to make it simple for someone with a data collector message to add it to the collection of sample data so we can collect samples from different real life scenarios, such as a windows desktop, an ubuntu server, an apple laptop, etc.

It should allow for sending the data:

on startup all at once, and/or
on a interval, e.g. every hour
- with support for a 'splay'
  A random number between zero and splay that is added to interval. splay is used with the Infra Client to help balance the load on the Chef Infra Server by ensuring that many Chef Infra Client runs are not occurring at the same interval.

Perhaps this is a 'sample-data' component that is part of its own 'product' (in the products.meta file), thus allowing this to be deployed easily through the --product functionality?

The implementation details are all negotiable.

Aha! Link: https://chef.aha.io/features/SH-123

kvivek1115 commented 4 years ago

Gone through chef-client data collector output_locations flow. Found some of the improvements and created PR https://github.com/chef/chef/pull/10393.

File output of data collector client runs added incrementally at locations provided in output_locations.

At this point, I am skipping to think of providing an option to stored client runs output at shared locations so that we can collect multiple node clients run at a single shared place(like AWS S3).

Sample of data_collector.out file:

{"chef_server_fqdn":"localhost","entity_uuid":"4c2f4e79-7303-4358-b690-89f3722ee44f","id":"34552a20-6100-4218-bfa2-4ba0999dcfb8","message_version":"1.0.0","message_type":"run_start","node_name":"PUN-LAP-VIVEKSI","organization_name":"chef_solo","run_id":"34552a20-6100-4218-bfa2-4ba0999dcfb8","source":"chef_solo","start_time":"2020-09-08T06:20:58Z"}
{"chef_server_fqdn":"localhost","entity_uuid":"4c2f4e79-7303-4358-b690-89f3722ee44f","expanded_run_list":{"id":"_default","run_list":[]},"id":"34552a20-6100-4218-bfa2-4ba0999dcfb8","message_version":"1.1.0","message_type":"run_converge","node":{"name":"PUN-LAP-VIVEKSI","chef_environment":"_default","json_class":"Chef::Node",....."chef_guid":"4c2f4e79-7303-4358-b690-89f3722ee44f","name":"PUN-LAP-VIVEKSI","chef_environment":"_default","recipes":[],"expanded_run_list":[],"roles":[]},"normal":{"tags":[]},"chef_type":"node","default":{},"override":{},"run_list":[]},"node_name":"PUN-LAP-VIVEKSI","organization_name":"chef_solo","resources":[],"run_id":"34552a20-6100-4218-bfa2-4ba0999dcfb8","run_list":[],"cookbooks":{},"policy_name":null,"policy_group":null,"start_time":"2020-09-08T06:20:58Z","end_time":"2020-09-08T06:20:58Z","source":"chef_solo","status":"success","total_resource_count":0,"updated_resource_count":0,"deprecations":[]}

entity_uuid & id for grouping records for node and run_id per client-runs of the node.

kvivek1115 commented 4 years ago

Chef-load is used to generate dummy data of client-runs. but it doesn't have provision to load realistic data. but as per doc https://github.com/chef/chef-load#using-sample-json-data-files. we have provided sample data files to capture real nodes data.
I have verified loading realistic sample data using chef-load is not working as of now, In order to minimize dev time, I would prefer to fix the chef-load issue to capture realistic node data.

kvivek1115 commented 4 years ago

Further proceedings:

Planning to add a new service sample-data-service that would be responsible for loading data based on chef-client runs data collector output locations.
The service can be enabled using --product sample-data option.
the located data collector folder should be on the workstation, no remote files support as of now.

kvivek1115 commented 4 years ago

More insights:

After enabling sample data service --product sample-data it would add chef-automate dev sample-data subcommand. so that infra client node mock data can be loaded into the system.
Namespace chef-automate dev sample-data also can take a config file with required config options.

chef / automate

Mock Infra Client service for Automate/Chef Desktop demo and acceptance data #4141