[RFC] Follow idea of immutable /usr vs. mutable overrides in /etc

jnpkrn commented 6 years ago

There are many practical reasons why we want to copy this growingly popular scheme, while enabling users to modify the agents per their needs, for instance:

having solely static data in /usr allows one to share that as read-only (or sparsely utilized copy-on-write) mount point with their VMs and containers so as to save space
no conflict-on-update issue

Hence my expectation is that OCF standard will address this, presumably in resource-agent-api.md by replacing

The Resource Agents are located in subdirectories under /usr/ocf/resource.d.

with something like

The Resource Agents are located in subdirectories under /usr/ocf/resource.d. OCF X.Y compliant RM shall first consult /etc/ocf/resource.d path for existence of the requested agent, which, when present, takes a precedence in the agent lookup. This makes for convenient customization of existing agents without altering them at the stated standard location, and in turn, simplifying a revert to stock configuration, coexistence with package updates, and possibly locked-down use of /usr mount point. The agent lookup based on the file presence is definite, any further issue, like file not being executable, notwithstanding.

krig commented 6 years ago

Sounds good to me. :+1:

kgaillot commented 6 years ago

I'm uncomfortable with putting executables in /etc, and I strongly think users shouldn't reuse the same provider+agent name when modifying an agent, as it greatly complicates troubleshooting.

The currently recommended approach for modifying resource agents is to create a new, custom provider under /usr/lib/ocf/resource.d. I could see extending the standard to allow providers in an alternate location, such as /usr/local, /opt, or /srv (followed by ocf/resource.d), or even allowing an OCF_RA_PATH environment variable. I'm not convinced it's a good idea though, as custom OCF scripts are not any more mutable than the commonly distributed ones. In production, few users are going to modify custom scripts directly; they are going to have a development environment, and then push changes to all production nodes (comparable to updating the resource-agents package).

jnpkrn commented 6 years ago

On 21/11/17 22:00 +0000, Ken Gaillot wrote:

I'm uncomfortable with putting executables in /etc,

There's a bunch of executable glue scripts already (including /etc/rc.d/init.d ones for non-systemd systems), which is exactly what resource agents are meant to be. I see no conflict here.

and I strongly think users shouldn't reuse the same provider+agent name when modifying an agent, as it greatly complicates troubleshooting.

Resource managers would need to identify the particular file clearly, true. There perhaps, if the idea is deemed good enough, should also be a provision in the specification wrt. explicit remapping of agent specification to particular path, like SIGHUP signal sent to the resource manager. Prior to that, it would keep initially figured path.

The currently recommended approach for modifying resource agents is to create a new, custom provider under /usr/lib/ocf/resource.d. I could see extending the standard to allow providers in an alternate location, such as /usr/local, /opt, or /srv (followed by ocf/resource.d), or even allowing an OCF_RA_PATH environment variable. I'm not convinced it's a good idea though, as custom OCF scripts are not any more mutable than the commonly distributed ones. In production, few users are going to modify custom scripts directly; they are going to have a development environment, and then push changes to all production nodes (comparable to updating the resource-agents package).

That's not the use case I had in mind, going that deep as to also change the actual configuration.

Rather something like: http://oss.clusterlabs.org/pipermail/users/2017-August/006303.html

Anyway, having bulk synchronization of /etc across the nodes can be appealing (also for systemd unit files, that can likewise be employed with a resource manager if there's a support).

-- Jan (Poki)

dmuhamedagic commented 6 years ago

I'd rather not. Doesn't the provider concept offer enough flexibility? As Ken said, it would also be quite difficult to figure out which RA is being run if the resource manager is allowed to look at more than one place for the same resource configuration.

oalbrigt commented 6 years ago

And you can already make custom or similarly named directories in /usr/lib/ocf/resource.d/heartbeat to avoid clashing with the agents provided by the distro.

jnpkrn commented 6 years ago

On 22/11/17 14:27 +0000, Dejan Muhamedagic wrote:

I'd rather not. Doesn't the provider concept offer enough flexibility? As Ken said, it would also be quite difficult to figure out which RA is being run if the resource manager is allowed to look at more than one place for the same resource configuration.

Additional idea sketched above would make the flip only at defined moments (initial start, being told to rescan the agent mapping). Not at arbitrary points, which would indeed make the situation hard to follow.

On 22/11/17 14:36 +0000, Oyvind Albrigtsen wrote:

And you can already make custom or similarly named directories in /usr/lib/ocf/resource.d/heartbeat to avoid clashing with the agents provided by the distro.

Naturally, but I thought it's clear that I anticipated it for a bit different use cases by now.

-- Jan (Poki)

jnpkrn commented 6 years ago

On 22/11/17 17:11 +0100, Jan Pokorný wrote:

Additional idea sketched above would make the flip only at defined moments (initial start, being told to rescan the agent mapping). Not at arbitrary points, which would indeed make the situation hard to follow.

On the other hand, let's not fall into the fallacy that current situation is a breeze in the "which agent variant was run, exactly" matter, at least with pacemaker in particular:

respective agent files are not locked for the pacemaker's lifespan, so can be edited anytime
ditto agents are not copied to a private temporary location first (or even copied into the memory to be executed from, which would be doable for hashbang/non-binary executables)
checksums of the agents are not logged/remembered+rechecked (or ditto on timestamp comparison basis)

Which already makes it rather difficult to tell which variant of the agent was run in any particular moment in the past unless you can testify nothing has intervened (and even then it's not 100%). So I don't see any remarkable regression, pros and cons summed together IMHO yields a positive result here when the mentioned additional idea of explicit rescans is mixed in.

-- Jan (Poki)

krig commented 6 years ago

The main benefit as I see it would be enabling the sysadmin to add their own agents on top of a read-only /usr file system delivered by a transactional update mechanism.

jnpkrn commented 6 years ago

On 23/11/17 09:01 +0000, Kristoffer Grönlund wrote:

The main benefit as I see it would be enabling the sysadmin to add their own agents on top of a read-only /usr file system delivered by a transactional update mechanism.

The other practical value is that administrator would (one wants to say, finally) gain power to defuse OCF-based resources that are not, by any mean, desired in the projected cluster from the set of agents that get installed unconditionally through the common distribution channels, sometimes including ocf:heartbeat:anything, which may be unsettling on its own: http://lists.clusterlabs.org/pipermail/users/2016-January/002178.html This is very similar to and directly inspired with systemd's masking approach.

So when the cluster should only ever serve for minimalistic httpd + virtual IP combo, the solution would be to run this upon each install/update of resource-agents in a RPM-based distro:

# mkdir -p /etc/lib/ocf/resource.d/heartbeat
# rpm -ql resource-agents \
  | grep '/usr/lib/ocf/resource.d/heartbeat/[^.].*' \
  | grep -vE 'apache|IPaddr2' \
  | sed "s|/usr|/etc|" | xargs -I{} echo ln -s /dev/null {}

For this to work harmonically, resource managers should further realize zero size of the discovered agents like this and exclude them from "try running" attempts (incl. at the system location, indeed).

For pacemaker in particular and putting fence-agents aside (preferrably, there would be a convergence towards OCF in some aspects, plus the agents are separated in discrete subpackages in el7, giving administrator at least some say to what's available), the only way to run an unrestricted command from cluster configuration would then be "lsb: Githubissues.

Githubissues is a development platform for aggregating issues.

ClusterLabs / OCF-spec

[RFC] Follow idea of immutable /usr vs. mutable overrides in /etc #5