crmsh too smart on groups and location constraints?

egroeper commented 8 years ago

When trying to improve ansible-pacemaker I found the following (unusual, seldom?) issue:

Starting state:

group vips IPaddr_rVIP IPaddr_eVIP
location loc-alice { IPaddr_rVIP IPaddr_eVIP } inf: ovpn-jessie-bp-1
location prefer-vips-loc vips role=Started 100: ovpn-jessie-bp-1

command:

crm configure delete vips
INFO: modified location:prefer-vips-loc from vips to IPaddr_rVIP

new state:

location loc-alice { IPaddr_rVIP IPaddr_eVIP } inf: ovpn-jessie-bp-1
location prefer-vips-loc IPaddr_rVIP role=Started 100: ovpn-jessie-bp-1

command:

crm configure group vips IPaddr_rVIP IPaddr_eVIP
INFO: modified location:loc-alice from IPaddr_rVIP to vips
INFO: modified location:prefer-vips-loc from IPaddr_rVIP to vips
INFO: modified location:loc-alice from IPaddr_eVIP to vips

I understand, why you are doing it, but this leads to problems.

new (incorrect?) state:

group vips IPaddr_rVIP IPaddr_eVIP
location loc-alice { vips } inf: ovpn-jessie-bp-1
location prefer-vips-loc vips role=Started 100: ovpn-jessie-bp-1

Could the problem be a group in "{ }"?

(problematic) command:

crm configure delete vips
INFO: hanging location:loc-alice deleted
INFO: modified location:prefer-vips-loc from vips to IPaddr_rVIP

Or is the problem not changing vips back to real resources when {} are used?

new (really incorrect) state:

location prefer-vips-loc IPaddr_rVIP role=Started 100: ovpn-jessie-bp-1

So you are improving a somehow unrelated location rule (not referencing the group directly) and then delete it, because it became invalid (hanging?) after your tweaks.

I'm not sure how relevant this is for real-world scenarios and unfortunately I'm not sure, if this issue still exists in current master, too. I only tested against the version in debian jessie backports (2.2.0-1~bpo8+1), which seems fairly recent and I didn't find something in your changelog of 2.2.1, that seems to address the issue.

krig commented 8 years ago

Hmm no, this issue is probably still in 2.2.1, I haven't touched this code. Yeah, the multi-location constraints with { } are quite new, it's not inconceivable that there's some bad interaction there...

krig commented 8 years ago

Personally, I think that the automatic rearrangement of constraints is problematic in itself, but it was there before I started working on crmsh and others disagree about this.

Not exactly a bug, it's not entirely clear what the correct thing to do is. Deleting loc-alice is not great, though. The preferrable thing to do would probably be to set the location constraint for all group members, not just the first.

krig commented 8 years ago

Hmm yes, I see now, there's definitely a problem with the { } syntax and the constraint modification.

dmuhamedagic commented 7 years ago

@egroeper: Does it really make sense to add location constraints for members of a group?

dmuhamedagic commented 7 years ago

@krig: How would you propose to handle possible "hanging" resource references in constraints? Or is it just some part of it that's bothering you?

dmuhamedagic commented 7 years ago

Just to add, it now occurred to me, that pacemaker wasn't very happy with constraints with resources which are members of a group. Perhaps we should keep that in mind too.

krig commented 7 years ago

Ah, the problem is with a sequence of actions that lead to a bad state. It's not the user that adds constraints for the group members, the constraints are already there and refer to the group, it's how we deal with constraints when deleting a group (and then how we deal with constraints when grouping resources) that is causing issues.

ClusterLabs / crmsh

crmsh too smart on groups and location constraints? #140