Closed jnpkrn closed 3 years ago
If/when this framework gets adoption, we can talk about other profiles.
I have one very specific on mind, stackable-1
, that would help along
the lines I mentioned at the recent problematic configuration pattern
discovery:
https://github.com/ClusterLabs/resource-agents/issues/1304#issuecomment-473525495
and which could get us what was good about rgmanager
(more controlled and less error-prone combinability of particular
resources within hierarchical arrangements) back to pacemaker
,
for instance.
Also quite interesting conclusions that might arise.
For one, an agent supporting whichever OCF version that introduces
profiles with some normative high-level semantics (yet to be added),
when not supporting clonable-X
like profile, should be actively
prevented from being configured as a clone.
Observe how currently, nothing prevents this yet it's about to incur
sufferings in some cases, as often times, the actual role of the RM
is not only to guarantee a living instance within cluster (aligned
with HA), but moreover at most single living instance within cluster
(a.k.a. mutual exclusion), and some agents so far silently require
the latter. Take, for example, ocf:heartbeat:IPaddr2
, mentally
drop the active (whereas it could be rather passive when there's
a support in RM per these optional profiles) support for pacemaker's
cloned resource style of running the agent, and you'll get something
that cannot be cloned/node-parallelized by definition (the default
of respective OCF standard introducing profiles). Conversely, only
agents of given OCF version explicitly declaring themselves as
supporting clonable-X
like profile would be allowed in configurations
to that effect (if particular RM supports the multi-node notion at all,
which is not a strict requirement, i.e., simplistic local/non-distributed
runners of OCF agents cannot be excluded, and these would apparently
not support this clonable-X
like profile).
The other conclusion could be that unique
can be kept to annotate
parameters as it was a spot-on naming that would be sad to ditch just
because of pacemaker being an elephant in the room (and pacemaker
could learn to treat 1.1 conformant agents the right, intended way),
and for reloadable stuff, another profile, reloadable-0
would be
devised and used by the agents that implement online reconfiguration.
That's a counter-proposal to this very part of #21.
Also realized that I wasn't entirely correct so far, repurposable-0
is not a specialization of clonable-0
, but rather a fully
orthogonal profile. Agents currently in accordance with the pacemaker's
use of promotable clones would require both clonable-0
and
promotable-0
-- specialization of repirposable-0
with only two
clearly distinguished roles -- profiles at once.
Another thing, it may be quite common for clonable-0
resource agent to
only support a single instance per node at maximum, and if there is a whole
class of these resources (e.g. controld
amongst them), it might make
sense to devise actual clonable-0
specialization, clonable-singleton-0
,
which would take away some configurables from the profile for being
constant (maximum level of instances per node, otherwise governed with
agent's profile configuration with plain clonable-0
profile -- at this point
I am still referring to non-existing specifications currently only living
in my head, but there would be a mechanism to provide additional
profile-specific variables along with expressing its support in the
meta data).
Overall, I think we could restore soundness (dubious these days for various isolated, proprietary extensions) of the OCF as a materialized agreement between providers (agents) and the consumers (resource managers).
Note that it appears there are way too many synergies available with this modular approach it would be very ill-advised not to go that route. It may be expensive in terms of one-off kicking this off, but the reality proves us a strict monolith is prohibitively expensive the whole time even if the changes to be made are really opt-in, self-contained extensions.
For instance, finally, the Linux kernel received something assuring regarding the PID reuse problem:
It was brought up multiple times on lists for the past few years, e.g.:
https://oss.clusterlabs.org/pipermail/developers/2017-July/001098.html https://lists.clusterlabs.org/pipermail/users/2017-April/021957.html
Now, we can finally do something reasonable with the said provision in the kernel!
Possibly like this:
pacemaker gets another daemon pacemaker-dscstored
that is meant
to serve as a local-volatile-DB-over-Unix-socket for other
(privileged) users (presumably agents or pacemaker's own daemons
if they'd gain new resilience featurures) to send
(unique user ID, label, file descriptor)
triples
(presumably of int
s, for simplicity of applying a hashtable,
where the accessor could be a trivial hash function
id ^ label % ht_size
, and id
a result of a digest function
taking two inputs, resource id
and per-run random seed
)
taken care of (passively or, e.g., poll
ing and replacing with
tombstone when the descriptor dies, for instance?
even some event subscription system is thinkable)
agents that want to use this feature:
declare ext-dscstore-1
profile (for instance)
resource manager supporting/pacemaker this profile would,
in response, export, as a profile-standardized environment
variable the end-point for the agent to connect to
(where pacemaker-dscstored
listens)
when the agent spawns, on start
, the respective service executable
directly, it can immediately grab the respective
open("/proc/<PID>", O_DIRECTORY)
and send it to the DB,
then it'd terminate as usual (persistence of opened descriptor
is achieved with the OS-inherent refcounting)
-- note that actual private/hidden volatile data (state tracking!)
could be handled in this way amongst agents' incarnations
throughout the life-cycle (see memfd_create(2)
), making
this pacemaker-dscstored
concept even more appealing
when the agent is invoked again, e.g., to monitor
, it can
fetch the descriptor back from the DB, and use is to mimic
the effect of kill(<PID>, 0)
, all reasonably (and finally!)
free-of-PID-reuse (for instance for the most trivial
OCF_CHECK_LEVEL=0
as suggested by the standard ... yeah, it
should be a de facto a sign of wellbehavedness of the agents
to define multiple levels of checks and people shall configure
multiple monitor
actions customarily)
possible extension: apparently, such fd
s can be poll
ed
on behalf of the agents, say when they requested this
to happen last time they stored the descriptor, so when this
happens, pacemaker-dscstored
can notify pacemaker-controld
that special case of monitor
(say wakeup
) shall be performed
so as to let the agent respond ASAP (the underlying implementation
behind the descriptor can come from portable pipe(2)
, on
Linux also from special ones like that from eventfd(2)
)
-- notice that this would effectively allow for an evolutationary
progress from horrible horrible horrible polling schemes and
bring us substantially closer to "availability of nines"
which availability is all after (if anybody iys not aware :-)
possible extension: resource-watchdogs (not epxlicitly defused
timeout [on stop
or unmanage
perhaps] elapsed without anyone
coming and grabbing the respective fd
, or, in addition, no poil
event registered either, see above -- this could actually be used
also for internal pacemaker watchdoging regarding all the
other daemons, assuming pacemaker-controld
would reciprocally
guard also pacemaker-dscstored
using the above wakeup
mechanism (along with systemd
facilitated watchdoging where
available?)
Now, IIUIC, there's a lot of related work so we can benefit even more
from the pidfd
abstraction in Linux, e.g., polling:
https://lwn.net/ml/linux-kernel/20190425190010.46489-1-joel@joelfernandes.org/
but we also need a modular, extensible way to formalize support for such mostly very optional extensions (here because it's strictly limited to a single system only) to the standard with the amount of administration proportional to the property of self-containment of these "modules". Full-blown all-decision-scope-equal approach doesn't really play on the agile tone -- observe how we were practically unable to move forward for how long? two decades?
Core + modules framework will hopefully make it easy to address all those long-term deficiencies in an effective way.
Also, think about how the high-level management tools could gradually adapt to new profiles (they would all of sudden start to refer to the "Add support for clonable-0 profile" instead of "Add support for pacemaker's \<clone>"; the shift is also that whenever there is any other resource manager supporting that profile, it could be fairly easy to adopt some abstract aspects of the profile support in them for it as well).
EDIT: renamed pacemaker-fdstored
to pacemaker-dsctored
, so that
we have a-f
range fully covered :-)
This is an interesting idea that may be useful one day, but I would rather wait until we have a strong need for such modularity before deciding the details, to ensure it's suitable to what becomes needed.
See the commit messages + commits themselves.
Good way to close the gap between theory (OCF) and practice (pacemaker) without hurting anyone possibly using the standard as well?
And to allow for sets of functionality to be gradually added without relying on a single-number (despite multi-dimensional space) serialization?
Isn't this something akin to how, e.g., Java ecosystem works (atomic units of APIs to be supported in full or not at all)?