This PR chain resolves the remaining known issues in certmgr- roughly:
issue #48 and #50 are fully addressed. Specifically, certmgr now is fully capable of regenerating keys/certs if the algo/size has changed without requiring operator intervention.
validating that a cert is part of the current CA is added
cfssl persistence mode was gutted; this in turn gives us atomic rename of PKI material on disk. Specifically, the race where an update is in place but permissions are not correct has been fully removed.
certmgr now explicitly caches as little as possible; it loads from disk as a general rule of thumb. This in turn means that it will be much more thorough about validation. Essentially, it no longer is optimistic, it's pessimistic and does full verification of on disk PKI.
Opportunistic regeneration of the cert/key if the CA has changed; if we have to notify the consuming service to reload PKI, we might as well maximize the PKIs's lifespan so we don't restart/reload the service any more than is necessary.
If the CA changes, a spec's target no longer will have duplicate actions ran against it. The internal codepaths/abstractions were un-broke making this sort of "fire an action only once" viable.
At this point, the core bugs we've suffered and known for certmgr for year+ are resolved.
Followup work would be centered on thus:
the logging is still fairly atrocious. Better/saner patterns need to be put in place so info log levels aren't unduly noisy.
metrics needs revisiting. The recent metrics refactoring essentially integrated into the code as it was; Via having a peer (and the initial pointer that the disk pki work provided), I've been enabled to rip out most of the horked ass internals that have been an ongoing problem. The core is now significantly different then what we had in certmgr 1.6.x, thus the recent metrics work needs a quick re-visitation.
explicit metric wiping upon spec reload to prevent metric staleness.
Deciding whether or not to implement a proper fsnotify watch of the certmgr,d directory.
Did I mentioned unit tests? Because yeah, this codebase lacking any core unit tests is beyond insane and directly made this 2 week code rewrite a pain in my ass.
This PR chain resolves the remaining known issues in certmgr- roughly:
At this point, the core bugs we've suffered and known for certmgr for year+ are resolved.
Followup work would be centered on thus: