RFC: OCF profiles - Githubissues

See the commit messages + commits themselves.

Good way to close the gap between theory (OCF) and practice (pacemaker) without hurting anyone possibly using the standard as well?

And to allow for sets of functionality to be gradually added without relying on a single-number (despite multi-dimensional space) serialization?

Isn't this something akin to how, e.g., Java ecosystem works (atomic units of APIs to be supported in full or not at all)?

If/when this framework gets adoption, we can talk about other profiles.

I have one very specific on mind, stackable-1, that would help along the lines I mentioned at the recent problematic configuration pattern discovery:

https://github.com/ClusterLabs/resource-agents/issues/1304#issuecomment-473525495

and which could get us what was good about rgmanager (more controlled and less error-prone combinability of particular resources within hierarchical arrangements) back to pacemaker, for instance.

Also quite interesting conclusions that might arise.

For one, an agent supporting whichever OCF version that introduces profiles with some normative high-level semantics (yet to be added), when not supporting clonable-X like profile, should be actively prevented from being configured as a clone.

Observe how currently, nothing prevents this yet it's about to incur sufferings in some cases, as often times, the actual role of the RM is not only to guarantee a living instance within cluster (aligned with HA), but moreover at most single living instance within cluster (a.k.a. mutual exclusion), and some agents so far silently require the latter. Take, for example, ocf:heartbeat:IPaddr2, mentally drop the active (whereas it could be rather passive when there's a support in RM per these optional profiles) support for pacemaker's cloned resource style of running the agent, and you'll get something that cannot be cloned/node-parallelized by definition (the default of respective OCF standard introducing profiles). Conversely, only agents of given OCF version explicitly declaring themselves as supporting clonable-X like profile would be allowed in configurations to that effect (if particular RM supports the multi-node notion at all, which is not a strict requirement, i.e., simplistic local/non-distributed runners of OCF agents cannot be excluded, and these would apparently not support this clonable-X like profile).

The other conclusion could be that unique can be kept to annotate parameters as it was a spot-on naming that would be sad to ditch just because of pacemaker being an elephant in the room (and pacemaker could learn to treat 1.1 conformant agents the right, intended way), and for reloadable stuff, another profile, reloadable-0 would be devised and used by the agents that implement online reconfiguration. That's a counter-proposal to this very part of #21.

Also realized that I wasn't entirely correct so far, repurposable-0 is not a specialization of clonable-0, but rather a fully orthogonal profile. Agents currently in accordance with the pacemaker's use of promotable clones would require both clonable-0 and promotable-0 -- specialization of repirposable-0 with only two clearly distinguished roles -- profiles at once.

Another thing, it may be quite common for clonable-0 resource agent to only support a single instance per node at maximum, and if there is a whole class of these resources (e.g. controld amongst them), it might make sense to devise actual clonable-0 specialization, clonable-singleton-0, which would take away some configurables from the profile for being constant (maximum level of instances per node, otherwise governed with agent's profile configuration with plain clonable-0 profile -- at this point I am still referring to non-existing specifications currently only living in my head, but there would be a mechanism to provide additional profile-specific variables along with expressing its support in the meta data).

Overall, I think we could restore soundness (dubious these days for various isolated, proprietary extensions) of the OCF as a materialized agreement between providers (agents) and the consumers (resource managers).

Note that it appears there are way too many synergies available with this modular approach it would be very ill-advised not to go that route. It may be expensive in terms of one-off kicking this off, but the reality proves us a strict monolith is prohibitively expensive the whole time even if the changes to be made are really opt-in, self-contained extensions.

For instance, finally, the Linux kernel received something assuring regarding the PID reuse problem:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3eb39f47934f9d5a3027fe00d906a45fe3a15fad

It was brought up multiple times on lists for the past few years, e.g.:

https://oss.clusterlabs.org/pipermail/developers/2017-July/001098.html https://lists.clusterlabs.org/pipermail/users/2017-April/021957.html

Now, we can finally do something reasonable with the said provision in the kernel!

Possibly like this:

pacemaker gets another daemon pacemaker-dscstored that is meant to serve as a local-volatile-DB-over-Unix-socket for other (privileged) users (presumably agents or pacemaker's own daemons if they'd gain new resilience featurures) to send (unique user ID, label, file descriptor) triples (presumably of ints, for simplicity of applying a hashtable, where the accessor could be a trivial hash function id ^ label % ht_size, and id a result of a digest function taking two inputs, resource id and per-run random seed) taken care of (passively or, e.g., polling and replacing with tombstone when the descriptor dies, for instance? even some event subscription system is thinkable)
agents that want to use this feature:
- declare ext-dscstore-1 profile (for instance)
- resource manager supporting/pacemaker this profile would, in response, export, as a profile-standardized environment variable the end-point for the agent to connect to (where pacemaker-dscstored listens)
- when the agent spawns, on start, the respective service executable directly, it can immediately grab the respective open("/proc/<PID>", O_DIRECTORY) and send it to the DB, then it'd terminate as usual (persistence of opened descriptor is achieved with the OS-inherent refcounting) -- note that actual private/hidden volatile data (state tracking!) could be handled in this way amongst agents' incarnations throughout the life-cycle (see memfd_create(2)), making this pacemaker-dscstored concept even more appealing
- when the agent is invoked again, e.g., to monitor, it can fetch the descriptor back from the DB, and use is to mimic the effect of kill(<PID>, 0), all reasonably (and finally!) free-of-PID-reuse (for instance for the most trivial OCF_CHECK_LEVEL=0 as suggested by the standard ... yeah, it should be a de facto a sign of wellbehavedness of the agents to define multiple levels of checks and people shall configure multiple monitor actions customarily)
- possible extension: apparently, such fds can be polled on behalf of the agents, say when they requested this to happen last time they stored the descriptor, so when this happens, pacemaker-dscstored can notify pacemaker-controld that special case of monitor (say wakeup) shall be performed so as to let the agent respond ASAP (the underlying implementation behind the descriptor can come from portable pipe(2), on Linux also from special ones like that from eventfd(2)) -- notice that this would effectively allow for an evolutationary progress from horrible horrible horrible polling schemes and bring us substantially closer to "availability of nines" which availability is all after (if anybody iys not aware :-)
- possible extension: resource-watchdogs (not epxlicitly defused timeout [on stop or unmanage perhaps] elapsed without anyone coming and grabbing the respective fd, or, in addition, no poil event registered either, see above -- this could actually be used also for internal pacemaker watchdoging regarding all the other daemons, assuming pacemaker-controld would reciprocally guard also pacemaker-dscstored using the above wakeup mechanism (along with systemd facilitated watchdoging where available?)

Now, IIUIC, there's a lot of related work so we can benefit even more from the pidfd abstraction in Linux, e.g., polling:

https://lwn.net/ml/linux-kernel/20190425190010.46489-1-joel@joelfernandes.org/

but we also need a modular, extensible way to formalize support for such mostly very optional extensions (here because it's strictly limited to a single system only) to the standard with the amount of administration proportional to the property of self-containment of these "modules". Full-blown all-decision-scope-equal approach doesn't really play on the agile tone -- observe how we were practically unable to move forward for how long? two decades?

Core + modules framework will hopefully make it easy to address all those long-term deficiencies in an effective way.

Also, think about how the high-level management tools could gradually adapt to new profiles (they would all of sudden start to refer to the "Add support for clonable-0 profile" instead of "Add support for pacemaker's \<clone>"; the shift is also that whenever there is any other resource manager supporting that profile, it could be fairly easy to adopt some abstract aspects of the profile support in them for it as well).

EDIT: renamed pacemaker-fdstored to pacemaker-dsctored, so that we have a-f range fully covered :-)

This is an interesting idea that may be useful one day, but I would rather wait until we have a strong need for such modularity before deciding the details, to ensure it's suitable to what becomes needed.

ClusterLabs / OCF-spec

RFC: OCF profiles #23