goerz / gc3pie

Automatically exported from code.google.com/p/gc3pie
0 stars 0 forks source link

Concurrency problems with VMPools in EC2 backend #404

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
When running two or more instances of the GC3Pie "ec2+..." backend
concurrently, each instance will create its own VMPool object.  Since
the path where these VMPool objects are saved depends on the
configuration file only, the concurrent instances will happily
overwrite each other's saved data.

This can lead to situations where one or more EC2 instances are lost:

0) two GC3Pie instances A & B are started concurrently; they both
   start with no VMs in the VMPool.
1) GC3Pie instance A starts EC2 VM i-0001
2) GC3Pie instance B starts EC2 VM i-0002
2) GC3Pie instance A saves VMPool, with only 1 VM: i-0001
3) GC3Pie instance A exits
4) GC3Pie instance B saves VMPool, with only 1 VM: i-0002
5) GC3Pie instance B exits
6) Now we have lost track of VM i-0001.

Original issue reported on code.google.com by riccardo.murri@gmail.com on 19 Jul 2013 at 3:20

GoogleCodeExporter commented 9 years ago
This is addressed (and quite likely fixed) in SVN r3642.

There is still one item to discuss before closing the issue: independent 
processes (e.g., session-based scripts) can create VMs in a pool, of which the 
other processes
using the same pool are not notified.  That is, until the other processes are 
stopped
and restarted, at which point they will become aware of the other VMs and 
(possibly)
start using them.

I guess we need to take a decision here:

1- either independent processes/sessions use independent VM pools and never 
ever share a VM;

2- or they can share VMs, and then we should alter the code so that 
periodically the in-memory state of the VM pool is sync'ed with what is on disk.

Original comment by riccardo.murri@gmail.com on 19 Jul 2013 at 3:45

GoogleCodeExporter commented 9 years ago
Regarding option 2-:

* This could be implemented simply by adding a constructor parameter
  `sync_every` (a non-negative integer):

  - each call to `add_vm` or `remove_vm` increments a counter;
  - when the counter is == to `self.sync_every` a `self.update()` is performed.
  - `sync_every=0` can then be used to disable synchronization.

* Next we have the problem of handling removals: what happens when one
  process removes a VM?

  a- should the other processes remove it too (self.update(remove=True)?
  b- or should a VM be removed only when *all* processes have taken the decision to remove it?

  Option b- has the problem that all processes should take this
  decision together, otherwise one process will re-create the VM
  marker file that others have deleted.

  Option a- has the problem that other processes may still reference
  the VM in their internal structures, but it seems more viable.

Original comment by riccardo.murri@gmail.com on 19 Jul 2013 at 3:51