OCF's tunnel vision regarding fully isolated runs of the agents (not the case in practice)

jnpkrn commented 5 years ago

Observing recently as of now unpreventable configuration error:

https://github.com/ClusterLabs/pcs/issues/197

made me think how can we fix these undesirable shortcomings in our cluster stack.

Borrowing from conclusion https://github.com/ClusterLabs/pcs/issues/197#issuecomment-473529216:

It's more a wider systemic flaw of never genuinely considering consequences of:

running semantically-matching instances of the agent in parallel

not preventing some patterns of agents' usage, or conversely, not enforcing some constraints to be used unconditionally for the configuration to be allowed

To untwist it, we probably need a top-down approach, hence stating the expectations clearly in the OCF standard, as proposed for 1.

Complicated semantics of mount is exactly one such example where both aspects shall be covered in the standard expressly:

possible intertwisting of different parameter sets agent instances on stop operation (and perhaps elsewhere)
for bind mount points, there could be a way to arrange for "last to leave the resource will trigger full-fledged stop", i.e., a concept built over an enforced uniform (stop order the exact inverse of start order) ordering of the bind instance to be fully inside the life-time of the other managed mount point it happens to delegate further (bind mount point would always had to be stacked under true mount, borrowing its target path as its own source path, never umounting on stop)

For 1., the standard shall be clear on the precautions agents are meant to take to assure the general sanity:

see the proposal https://github.com/ClusterLabs/resource-agents/issues/1304#issuecomment-473522631 and also the requirement on the resource manager to explicitly avoid parallel executions of the same-parameter-sets (subject to definition) instances

For 2., the metadata-level way of expressing "combinability" (stackability) of the agents shall be devised. Prior art in rgmanager can be a useful source of inspiration.

jnpkrn commented 5 years ago

Note that systemd type of resources has the combinability/stackability problem inherently resolved (After=, Conflicts=, etc.).

Initscripts are, from today's perspective, diminishingly weak, but for them, it's at least a well-known fact they are only good for an isolated run (complex relationships are better expressed directly within a single initscript), and concern 1. doesn't apply to them, since they are inherently singletons within systems (unlike with template unit files if they are to get any sort of native support).

jnpkrn commented 5 years ago

See also an idea of stackable-1 profile and of profiles overall that could accommodate such an opt-in extension very gracefully and in a unified manner. Consequently, this would stand as a main motivator for this framework of profiles on top of non-optional bare-bones OCF core standard.

ClusterLabs / OCF-spec

OCF's tunnel vision regarding fully isolated runs of the agents (not the case in practice) #22