duplicate node records in postgres (nodes) table

vjeffrey commented 4 years ago

User Story

There are a few different ways a node can get added to the nodes postgres table (this is the data that serves the api/v0/nodes/search api).

a client run report is ingested, so we send information to the nodemanager about the node
a compliance report is ingested, so we send information to the nodemanager about the node
a user "manually" adds the node by accessing compliance/scan-jobs/nodes/add and entering information about some nodes
a user creates an aws, azure, or gcp node integration by visiting settings/node-integrations/add

A user recently reported that when they created an aws-ec2 integration, they ended up with duplicate node records under the nodes/search api. Since chef-client was running on those nodes, we already had a record of them, but failed to recognize those nodes as the same object when ingesting information.

The nodes table has two different unique constraints:

the traditional id one, which correlates to the node uuid that is on the ingested report or created at time of object creation
a three-field unique match on source_id, source_region, and account_id. those fields, when referencing a node in aws, correlate to the instance-id of the node, the region in which the node exists (e.g. us-east-1), and the id of the aws account in which it exists. for azure, this is the node id, region, and tenant id.

This specific user problem can be traced to a missing data field on the message sent from the client run ingestion path to the nodemanager. We are only sending the instance id and region, no account id. The work to add the account id (and ensure we're storing all the correct information when ingesting the node data) is in progress.

But there are other cross-points where the duplicate node object problem still exists. @kmacgugan wrote a thing about this

It's important that we address these duplicate node situations as much as possible for the one node view epic. As part of that epic, we'll be exposing all those nodes records in the ui, which will make any duplicate node object issues more apparent.

Definition of Done

read kyleen's doc
figure out what node uuid work has already been done and what problems still exist
see if there are other problems (if a node is running chef-client via hab, what is the uuid for that node? is there a node uuid attached to each app in the ingestion process for apps? do those ids match?)
have the applications-service ingestion process let the nodemanager know about the node the app is running on
create issues for the problems etc so we can address them

danielsdeleo commented 4 years ago

currently, the chef-client UUID can be set to the habitat ring member ID, but this is opt-in only.

The client.rb template is here: https://github.com/chef/chef/blob/cd444f5dfd39e4494e0a8495b20b39259dab923a/habitat/config/client.rb#L17-L19 and the default config is https://github.com/chef/chef/blob/master/habitat/default.toml#L36

I am not sure why this setting is disabled by default but if we can change it then we would be able to correlate services from the applications tab database with infra nodes.

danielsdeleo commented 4 years ago

Seems that effortless will use its own client.rb template that uses the hab UUID if automate is enable at all: https://github.com/chef/effortless/blob/e181a3dd3473fd9233f34d6b36ed2dfd06cc3d3c/scaffolding-chef-infra/lib/linux/client-chunk.rb#L13-L17

kmacgugan commented 4 years ago

For inspec runs, we used to use the chef_guid file but I think that was pulled out. I don't know if that is something we have added back in yet. So, the inspec-infra UUID link exists when the audit cookbook is used but not inspec runs.

For the infra schema, we have the source_id, region_id, account_id or tenant_id buried in the ohai data currently. Does it make sense to pull these to a top level field (or object) for querying in the future? This would make querying the infra data set easier in the future.

vjeffrey commented 4 years ago

For the infra schema, we have the source_id, region_id, account_id or tenant_id buried in the ohai data currently. Does it make sense to pull these to a top level field (or object) for querying in the future? This would make querying the infra data set easier in the future.

The ingest service already stores the instance id and region on the node object, so once we have the account id on there, that should take care of things (that's done over here).

~Or are you talking pulling it into something else/at a diff level?~

I see what you mean now, b/c we have ohai data for the azure nodes too -- YES!

vjeffrey commented 4 years ago

follow up issues to create:

work to have applications service tell nodemanager about the node running the application and service status
get azure metadata from ohai data in ingest service and save the values to the cloud values for the node
find out if you end up with dup node records if you run inspec scan on a node with chef-client running on it reporting in to automate

chef / automate

duplicate node records in postgres (nodes) table #2511

User Story

Definition of Done