GENI-NSF / geni-tools

Omni, stitcher, GCF sample aggregate manager, and other GENI tools.
Other
11 stars 15 forks source link

stitcher.py should fail if there is an existing sliver #438

Closed innnkkki closed 9 years ago

innnkkki commented 9 years ago

While trying stitcher I noticed that if it gets an error about a sliver already existing then it will keep retrying although there is no hope in ever succeeding.

If stitcher gets back an error about existing sliver (that was not created by stitcher), stitcher should exit and instruct user to delete their sliver.

To be clear in order to test this, I created a VM out of band in one of the AMs that i used for stitching.

Imported from trac ticket #438, created by nriga on 11-22-2013 at 09:52, last modified: 02-12-2014 at 13:04

ahelsing commented 9 years ago

What AM? What is the error message. Stitcher can learn to avoid retrying on certain messages, but I have to know what kind of AM and what message.

Trac comment by ahelsing on 11-22-2013 at 12:55

luisanevers commented 9 years ago

Using gcf-2.5-refactor version (2014-02-10) tried to verify that a stitcher createsliver would exit if it found an existing sliver by the same name, but I have found that the stitcher does not exit:

15:36:18 INFO     stitch.Aggregate: Writing to '/tmp/lnx-createsliver-request-11-instageni-rnet-missouri-edu.xml'
15:36:18 INFO     stitch.Aggregate: 
    Stitcher doing createsliver at https://www.instageni.rnet.missouri.edu:12369/protogeni/xmlrpc/am
15:36:20 ERROR    omni:  {'output': 'Must delete existing slice first', 'code': {'protogeni_error_log': 'urn:publicid:IDN+instageni.rnet.missouri.edu+log+028628310805288df751631b877f7709', 'am_type': 'protogeni', 'geni_code': 7, 'am_code': 7, 'protogeni_error_url': 'https://www.instageni.rnet.missouri.edu/spewlogfile.php3?logfile=028628310805288df751631b877f7709'}, 'value': 0}
15:36:20 INFO     stitch.Aggregate: Got AMAPIError doing createsliver lnx at <Aggregate urn:publicid:IDN+instageni.rnet.missouri.edu+authority+cm>: AMAPIError: Error from Aggregate: code 7. protogeni AM code: 7: Must delete existing slice first.
15:36:20 WARNING  stitcher: Stitching failed but will retry: Circuit reservation failed at <Aggregate urn:publicid:IDN+instageni.rnet.missouri.edu+authority+cm> (AMAPIError: Error from Aggregate: code 7. protogeni AM code: 7: Must delete existing slice first.). Try again from the SCS
15:36:20 INFO     stitcher: Calling SCS for the 2rd time...
15:36:21 INFO     stitcher: Pausing for 600 seconds for Aggregates to free up resources...

Are my expectations incorrect?

Trac comment by lnevers (github user: luisanevers) on 02-10-2014 at 15:39

ahelsing commented 9 years ago

Right. That fix was only made after the tarball you just tried.

Trac comment by ahelsing on 02-10-2014 at 16:20

luisanevers commented 9 years ago

The next tarball gcf-2.5-refactor2 (02-11-14) resolves the problem:

08:52:35 INFO     stitch.Aggregate: 
    Stitcher doing createsliver at https://boss.instageni.gpolab.bbn.com:12369/protogeni/xmlrpc/am
08:52:37 ERROR    omni:  {'output': 'Must delete existing slice first', 'code': {'protogeni_error_log': 'urn:publicid:IDN+instageni.gpolab.bbn.com+log+596ddc878459bd07c1f5b4cc92f3b28f', 'am_type': 'protogeni', 'geni_code': 7, 'am_code': 7, 'protogeni_error_url': 'https://boss.instageni.gpolab.bbn.com/spewlogfile.php3?logfile=596ddc878459bd07c1f5b4cc92f3b28f'}, 'value': 0}
08:52:37 INFO     stitch.Aggregate: Got AMAPIError doing createsliver ig-gpo-ig-utah at <Aggregate urn:publicid:IDN+instageni.gpolab.bbn.com+authority+cm>: AMAPIError: Error from Aggregate: code 7. protogeni AM code: 7: Must delete existing slice first.
08:52:37 ERROR    stitcher: Stitching failed with an error: Reservation request impossible at <Aggregate urn:publicid:IDN+instageni.gpolab.bbn.com+authority+cm>: You already have a reservation in slice ig-gpo-ig-utah at this aggregate - delete it first or use another aggregate. AMAPIError: Error from Aggregate: code 7. protogeni AM code: 7: Must delete existing slice first....
Reservation request impossible at <Aggregate urn:publicid:IDN+instageni.gpolab.bbn.com+authority+cm>: You already have a reservation in slice ig-gpo-ig-utah at this aggregate - delete it first or use another aggregate. AMAPIError: Error from Aggregate: code 7. protogeni AM code: 7: Must delete existing slice first....

Trac comment by lnevers (github user: luisanevers) on 02-11-2014 at 08:53

luisanevers commented 9 years ago

This is fixed for PG/IG fixed, but does not exit for MyPLC aggregates:

15:53:59 INFO     stitch.Aggregate: Writing to '/tmp/ig-miss-ig-max-createsliver-request-12-dragon-maxgigapop-net.xml'
15:53:59 INFO     stitch.Aggregate: 
    Stitcher doing createsliver at http://max-myplc.dragon.maxgigapop.net:12346
15:54:06 ERROR    omni:  {'output': ': CreateSliver: Existing record: urn:publicid:IDN+ch.geni.net:ln-prj+slice+ig-miss-ig-max, ', 'geni_api': 2, 'code': {'am_type': 'sfa', 'geni_code': 7, 'am_code': 7}, 'value': ''}
15:54:06 INFO     stitch.Aggregate: Got AMAPIError doing createsliver ig-miss-ig-max at 
<Aggregate urn:publicid:IDN+dragon.maxgigapop.net+authority+am>: AMAPIError: Error from 
Aggregate: code 7. sfa AM code: 7: : CreateSliver: Existing record: urn:publicid:IDN+ch.geni.net:ln-prj+slice+ig-miss-ig-max, .
15:54:06 WARNING  stitcher: Stitching failed but will retry: Circuit reservation failed at 
<Aggregate urn:publicid:IDN+dragon.maxgigapop.net+authority+am> (AMAPIError: Error from Aggregate: 
code 7. sfa AM code: 7: : CreateSliver: Existing record: urn:publicid:IDN+ch.geni.net:ln-prj+slice+ig-miss-ig-max, .). 
Try again from the SCS
15:54:06 WARNING  stitcher: Had reservation at https://www.instageni.rnet.missouri.edu:12369/protogeni/xmlrpc/am
15:54:06 INFO     stitch.Aggregate: Doing deletesliver at https://www.instageni.rnet.missouri.edu:12369/protogeni/xmlrpc/am
15:54:34 ERROR    omni:  {'output': 'resource is busy; try again later', 'code': {'protogeni_error_log': 'urn:publicid:IDN+instageni.rnet.missouri.edu+log+0493d456bee704974d93a657eac41eb3', 'am_type': 'protogeni', 'geni_code': 14, 'am_code': 14, 'protogeni_error_url': 'https://www.instageni.rnet.missouri.edu/spewlogfile.php3?logfile=0493d456bee704974d93a657eac41eb3'}, 'value': 0}
15:55:59 WARNING  stitcher: Deleted reservation at https://www.instageni.rnet.missouri.edu:12369/protogeni/xmlrpc/am
15:55:59 INFO     stitcher: Calling SCS for the 2rd time...
15:55:59 INFO     stitcher: Pausing for 600 seconds for Aggregates to free up resources...

Trac comment by lnevers (github user: luisanevers) on 02-11-2014 at 16:11

ahelsing commented 9 years ago

Added that one too.

Trac comment by ahelsing on 02-11-2014 at 22:00

luisanevers commented 9 years ago

Verified that existing slivers at MAX MyPLC and at I2 ION aggregate causes stitcher to exit, but found some unexpected messages in the output. Here is the output:

09:03:38 INFO     stitch.Aggregate: Writing to '/tmp/ig-miss-ig-max-createsliver-request-11-dragon-maxgigapop-net.xml'
09:03:38 INFO     stitch.Aggregate:     Stitcher doing createsliver at http://max-myplc.dragon.maxgigapop.net:12346
09:03:45 ERROR    omni:  {'output': ': CreateSliver: Existing record: urn:publicid:IDN+ch.geni.net:ln-prj+slice+ig-miss-ig-max, 
', 'geni_api': 2, 'code': {'am_type': 'sfa', 'geni_code': 7, 'am_code': 7}, 'value': ''}
09:03:45 INFO     stitch.Aggregate: Got AMAPIError doing createsliver ig-miss-ig-max at 
<Aggregate max>: AMAPIError: Error from Aggregate: code 7. sfa AM code: 7: : CreateSliver: Existing record: 
urn:publicid:IDN+ch.geni.net:ln-prj+slice+ig-miss-ig-max, .
09:03:45 WARNING  stitcher: Stitching failed but will retry: Reservation request impossible at <Aggregate max>: 
AMAPIError: Error from Aggregate: code 7. sfa AM code: 7: : CreateSliver: Existing record: urn:publicid:IDN+ch.geni.net:...
09:03:45 INFO     stitcher: Calling SCS for the 2rd time...
09:03:45 ERROR    stitcher: Root cause error: Reservation request impossible at <Aggregate max>: AMAPIError: Error 
from Aggregate: code 7. sfa AM code: 7: : CreateSliver: Existing record: urn:publicid:IDN+ch.geni.net:...
Reservation request impossible at <Aggregate max>: AMAPIError: Error from Aggregate: code 7. sfa AM code: 7: : 
CreateSliver: Existing record: urn:publicid:IDN+ch.geni.net:... which caused StitchingServiceFailedError: Error 
from Stitching Service: code 3: MxTCE ComputeWorker return error message '#### Action_ProcessRequestTopology_MP2PFinish() 
Cannot find the set of paths for the RequestTopology. '.

The stitcher does exit but the message "Action_ProcessRequestTopology_MP2P::Finish() Cannot find the set of paths for the RequestTopology." is unexpected.

Also, did not expect to see a retry message: "09:03:45 INFO stitcher: Calling SCS for the 2rd time..."

Trac comment by lnevers (github user: luisanevers) on 02-12-2014 at 09:26

ahelsing commented 9 years ago

MAX is an intermediate aggregate - one that you did not explicitly request.

So we give the SCS a chance to find another route to get to the aggregate that you did request. Theoretically, there could be one. And unless I start hard-coding into SCS knowledge of individual aggregates and their topologies, I don't think I can know that in this case there is not.

So you see "but will retry..." and "Calling SCS for the 2rd time..." but that causes the SCS to say it "Cannot find the set of paths" - basically meaning "there's no way to get there if you tell me to avoid MAX".

Trac comment by ahelsing on 02-12-2014 at 09:33

luisanevers commented 9 years ago

Could you elaborate on when/why it makes sense to look for a different path if the sliver already exists?

The Error/INFO/Warning messages are very verbose and repetitive. Can we simplify the way stitcher reports that it found the sliver exists in the path and that it will look for another path?

The "09:03:45 ERROR omni" and the "09:03:45 INFO stitch.Aggregate" messages are not providing any additional information. Maybe we should only show the "09:03:45 WARNING stitcher" message.

09:03:45 ERROR    omni:  {'output': ': CreateSliver: Existing record: 
urn:publicid:IDN+ch.geni.net:ln-prj+slice+ig-miss-ig-max, ', 'geni_api': 2, 
'code': {'am_type': 'sfa', 'geni_code': 7, 'am_code': 7}, 'value': ''}

09:03:45 INFO     stitch.Aggregate: Got AMAPIError doing createsliver ig-miss-ig-max 
at<Aggregate max>: AMAPIError: Error from Aggregate: code 7. sfa AM code: 7: : 
CreateSliver: Existing record: urn:publicid:IDN+ch.geni.net:ln-prj+slice+ig-miss-ig-max, .

09:03:45 WARNING  stitcher: Stitching failed but will retry: Reservation request impossible 
at <Aggregate max>: AMAPIError: Error from Aggregate: code 7. sfa AM code: 7: : CreateSliver: 
Existing record: urn:publicid:IDN+ch.geni.net:...

Trac comment by lnevers (github user: luisanevers) on 02-12-2014 at 10:16

ahelsing commented 9 years ago

Why look for a different path? Say you are trying to get to non-existent AM MAX2 from PG-Utah. MAX2 has a connection to MAX that connects to ION that connects to PG-Utah. But MAX2 ALSO has a direct connection to PG-Utah. The SCS gave you a path through MAX the first time around, and you got this error. If you go back to the SCS, the SCS should find the direct path from PG-Utah, skipping MAX where your reservation cannot succeed.

So if the topology of the GENI stitching network allows it, there can be a success without using MAX, and going back to the SCS is a good idea.

The ERROR message comes from underlying Omni. It's a little hard to suppress that. But I think I can suppress the INFO message.

Trac comment by ahelsing on 02-12-2014 at 10:49

luisanevers commented 9 years ago

Replying to ahelsing:

The ERROR message comes from underlying Omni. It's a little hard to suppress that. But I think I can suppress the INFO message.

That's too bad. For the 3 messages from omni, stitch.Aggregate, stitcher that occur for many scenarios in createsliver, the omni message is usually less helpful. Mostly because it always reports as an "ERROR" even for conditions where stitcher can continue and be successful.

Trac comment by lnevers (github user: luisanevers) on 02-12-2014 at 13:04