Elastic Slices -- repeated VM add/deletes start to fail

hinchliff commented 7 years ago

If we add/delete VMs alternating every 30 seconds, after a couple of minutes the adds stop working.

ibaldin commented 7 years ago

From Bing:

Hello,

I am working on the elasticity management of slices on exogeni racks. Specifically, I run a local slice manager to request adding/deleting compute nodes (VMs) of a slice via APIs of ahab/ndllib. Recently, I moved my manager code from ndllib to ahab for an anomaly (or a bug) I found about the operations of adding/deleting compute nodes with ndllibs.

I document my findings in this post and expect that other exogeni users can benefit from it.
I ran two experiments on umass rack: one with a slice manager using ahab, the other with a manager using ndllib. In each experiment, create a slice, then add/delete VMs alternatively in a row (add/delete 10 VMs for ahab expeirment, add/delete 9 VMs for ndllib expeirment), the time interval between two consecutive operations is 5 seconds, before each add/delete operation, I count the number of VMs of the slice by querying the slice manifest. I attached the numbers of VMs of a slice across operations as ahab_elasticity.log and ndllib.log.

The logs show that the add/delete operations function correctly with ahab APIs, and incorrectly with ndllib APIs.

It is worth noting that the time interval between two consecutive manifest queries is longer and longer (in the logs, < 10 seconds at the start, > 1 minutes at the end of the experiment). It suggests that with frequent operations on slice manifest query/modification, it takes longer and longer time for users to query/modify a slice manifest. It might request a larger thread-pool or some other consistency model on the exogeni controller side.

ibaldin commented 7 years ago

ahab_elasticity.log.txt ndl_elasticity.log.txt

ibaldin commented 7 years ago

According to Bing things operate correctly with AHAB, so I'm closing this. She does raise an issue of ever-increasing query times that should be looked at.

RENCI-NRIG / orca5

Elastic Slices -- repeated VM add/deletes start to fail #148