ManageIQ / manageiq-design

Design documents and UX mockups for ManageIQ
https://manageiq.github.io/manageiq-design
11 stars 15 forks source link

[RFC] Use Update Driven Refresh for Pods #33

Closed agrare closed 4 years ago

agrare commented 6 years ago

Problem

Currently only full refresh is supported for container providers (Kubernetes/Openshift), with sufficiently large environments this refresh can take over 2 hours. This is long enough that pods/containers can be created and deleted while a refresh is running causing them to be completely missed by ManageIQ.

Without a record of all pods which were created policy actions cannot be run and metrics cannot be collected for chargeback.

Proposed Solution

Kubernetes supports a stream update mechanism /watch which delivers changes to a registered client. There is an example in the kubeclient repo: https://github.com/abonas/kubeclient#receive-entity-updates

We propose adding a new worker (InventoryCollectorWorker) which registers for these WatchStreams specifically for pods and sends ManagerRefresh::Target targets with the payload to the RefreshWorker for parsing and saving. Since all updates are persisted in the queue and will be handled by the refresh worker no pod will be missed.

In addition to maintaining a record of all pods which were created&deleted we can collect metrics on recently disconnected pods ensuring we have metrics for these short lived containers.

PRs

cc @Fryguy @Ladas @kbrock @simon3z

Moved from: https://github.com/ManageIQ/manageiq/issues/16240

agrare commented 6 years ago

Issues still to be worked out:

Fryguy commented 6 years ago

@agrare As discussed offline, we should treat any realtime watcher the way we do events or the vmware watcher, and that is that there should probably be 2 threads...one that watches and puts the raw data on an internal in-memory queue, and a second thread that reads from that queue and writes to the database (probably to MiqQueue). With an internal queue you also have the advantage of batching things up, so you could write to the MiqQueue every 5 seconds instead and send the entirety of what was seen in a 5 second period. This would prevent the harsh one-by-one slamming of the MiqQueue.

agrare commented 6 years ago

MVP for this has been merged.

agrare commented 6 years ago

Still needs to be completed: