Fix race condition in configureSwarm effector

brooklyncentral / clocker

Apache Brooklyn cloud native infrastructure blueprints

Apache License 2.0

428 stars 66 forks source link

Fix race condition in configureSwarm effector #365

Closed googlielmo closed 8 years ago

googlielmo commented 8 years ago

guard critical section with flock

Fixes https://cloudsoft.jira.com/browse/CCS-106

The swarm-manager-load-balancer sometimes returns an empty response. This is due to the fact that several instances of haproxy are running, so the request may be routed to an instance with an invalid config.

There is a race condition in the configureSwarm effector that is called to re-configure haproxy every time there is a change in the swarm node list.

If the effector is invoked simultaneously, e.g. when starting up two nodes, it may result in duplicate instances, as new haproxy instances are started with an option to kill and replace the old process, but as they don't know about each other, they both live on.

(A different problem may happen if two effector invocations enter the critical section in the wrong order, but this is not resolvable within the effector itself.)

grkvlt commented 8 years ago

Can we guarantee availability of flock in CentOS 7? What about other distros?

neykov commented 8 years ago

(A different problem may happen if two effector invocations enter the critical section in the wrong order, but this is not resolvable within the effector itself.)

Solving this would remove the need for syncing in the effector. What do you think about making InvokeEffectorOnSensorChange execute the effectors sequentailly?

googlielmo commented 8 years ago

@grkvlt good point, as far as it is a "CentOS" system, it should have the util-linuxpackage (it's always been part of the minimal install); but it costs us nothing to add an install command to be really sure :-) Will fix.

googlielmo commented 8 years ago

@neykov right! it would solve this issue as well.

grkvlt commented 8 years ago

Also, do you think we should do the same for the Kubernetes use of the HAProxy entity, which has a similar effector configureKubernetes

googlielmo commented 8 years ago

@grkvlt yes indeed! good point, is it worth generalising the effector, moving it to the haproxy-load-balancer itself? Also @andreaturli raised a very similar point (that it's probably worth doing it)

googlielmo commented 8 years ago

Thanks @grkvlt for the excellent points! I've addressed your comments.

neykov commented 8 years ago

What would happen if the connection is reset while holding the flock and having the haproxy launched, but before closing the stream?

googlielmo commented 8 years ago

@neykov the lock will be released when the bash process holding it terminates. Since I'm curious anyway I'll set up a little experiment to test that ;)

googlielmo commented 7 years ago

@neykov I found an answer to your question above: The lock will NOT be released, in fact. I've addressed this issue in https://github.com/brooklyncentral/clocker/pull/382 cc: @grkvlt @aledsage