Open vjeffrey opened 4 years ago
Gone through chef-client data collector output_locations
flow. Found some of the improvements and created PR https://github.com/chef/chef/pull/10393.
File output of data collector client runs added incrementally at locations provided in output_locations
.
At this point, I am skipping to think of providing an option to stored client runs output at shared locations so that we can collect multiple node clients run at a single shared place(like AWS S3).
Sample of data_collector.out file:
{"chef_server_fqdn":"localhost","entity_uuid":"4c2f4e79-7303-4358-b690-89f3722ee44f","id":"34552a20-6100-4218-bfa2-4ba0999dcfb8","message_version":"1.0.0","message_type":"run_start","node_name":"PUN-LAP-VIVEKSI","organization_name":"chef_solo","run_id":"34552a20-6100-4218-bfa2-4ba0999dcfb8","source":"chef_solo","start_time":"2020-09-08T06:20:58Z"}
{"chef_server_fqdn":"localhost","entity_uuid":"4c2f4e79-7303-4358-b690-89f3722ee44f","expanded_run_list":{"id":"_default","run_list":[]},"id":"34552a20-6100-4218-bfa2-4ba0999dcfb8","message_version":"1.1.0","message_type":"run_converge","node":{"name":"PUN-LAP-VIVEKSI","chef_environment":"_default","json_class":"Chef::Node",....."chef_guid":"4c2f4e79-7303-4358-b690-89f3722ee44f","name":"PUN-LAP-VIVEKSI","chef_environment":"_default","recipes":[],"expanded_run_list":[],"roles":[]},"normal":{"tags":[]},"chef_type":"node","default":{},"override":{},"run_list":[]},"node_name":"PUN-LAP-VIVEKSI","organization_name":"chef_solo","resources":[],"run_id":"34552a20-6100-4218-bfa2-4ba0999dcfb8","run_list":[],"cookbooks":{},"policy_name":null,"policy_group":null,"start_time":"2020-09-08T06:20:58Z","end_time":"2020-09-08T06:20:58Z","source":"chef_solo","status":"success","total_resource_count":0,"updated_resource_count":0,"deprecations":[]}
entity_uuid
& id
for grouping records for node and run_id
per client-runs of the node.
Chef-load is used to generate dummy data of client-runs. but it doesn't have provision to load realistic data. but as per doc https://github.com/chef/chef-load#using-sample-json-data-files. we have provided sample data files to capture real nodes data.
I have verified loading realistic sample data using chef-load is not working as of now, In order to minimize dev time, I would prefer to fix the chef-load issue to capture realistic node data.
Further proceedings:
Planning to add a new service sample-data-service
that would be responsible for loading data based on chef-client runs data collector output locations.
The service can be enabled using --product sample-data
option.
the located data collector folder should be on the workstation, no remote files support as of now.
More insights:
--product sample-data
it would add chef-automate dev sample-data
subcommand. so that infra client node mock data can be loaded into the system.chef-automate dev sample-data
also can take a config file with required config options.
Create a service that can be used in Automate 2 to send scheduled data to Automate mimicing the data that would be send from a client.
There are 4 types of reports currently sent to Automate. We're interested in only the first one for now.
infra client data sent to the data collector
actions sent from the infra server
The run start and run end schema messages generated by the Infra Client can be saved to disk as part of an Infra Client run using the
data_collector_output_locations
Chef::Config setting.The service should take a collection of run start and run end schema messages as files on disk and send them to a configurable Automate data collector URL with a configurable data collector token/secret. The goal is to make it simple for someone with a data collector message to add it to the collection of sample data so we can collect samples from different real life scenarios, such as a windows desktop, an ubuntu server, an apple laptop, etc.
It should allow for sending the data:
on a interval, e.g. every hour
A random number between
zero
andsplay
that is added tointerval
. splay is used with the Infra Client to help balance the load on the Chef Infra Server by ensuring that many Chef Infra Client runs are not occurring at the same interval.Perhaps this is a 'sample-data' component that is part of its own 'product' (in the
products.meta
file), thus allowing this to be deployed easily through the--product
functionality?The implementation details are all negotiable.
Aha! Link: https://chef.aha.io/features/SH-123