Closed innnkkki closed 9 years ago
What AM? What is the error message. Stitcher can learn to avoid retrying on certain messages, but I have to know what kind of AM and what message.
Trac comment by ahelsing on 11-22-2013 at 12:55
Using gcf-2.5-refactor version (2014-02-10) tried to verify that a stitcher createsliver would exit if it found an existing sliver by the same name, but I have found that the stitcher does not exit:
15:36:18 INFO stitch.Aggregate: Writing to '/tmp/lnx-createsliver-request-11-instageni-rnet-missouri-edu.xml'
15:36:18 INFO stitch.Aggregate:
Stitcher doing createsliver at https://www.instageni.rnet.missouri.edu:12369/protogeni/xmlrpc/am
15:36:20 ERROR omni: {'output': 'Must delete existing slice first', 'code': {'protogeni_error_log': 'urn:publicid:IDN+instageni.rnet.missouri.edu+log+028628310805288df751631b877f7709', 'am_type': 'protogeni', 'geni_code': 7, 'am_code': 7, 'protogeni_error_url': 'https://www.instageni.rnet.missouri.edu/spewlogfile.php3?logfile=028628310805288df751631b877f7709'}, 'value': 0}
15:36:20 INFO stitch.Aggregate: Got AMAPIError doing createsliver lnx at <Aggregate urn:publicid:IDN+instageni.rnet.missouri.edu+authority+cm>: AMAPIError: Error from Aggregate: code 7. protogeni AM code: 7: Must delete existing slice first.
15:36:20 WARNING stitcher: Stitching failed but will retry: Circuit reservation failed at <Aggregate urn:publicid:IDN+instageni.rnet.missouri.edu+authority+cm> (AMAPIError: Error from Aggregate: code 7. protogeni AM code: 7: Must delete existing slice first.). Try again from the SCS
15:36:20 INFO stitcher: Calling SCS for the 2rd time...
15:36:21 INFO stitcher: Pausing for 600 seconds for Aggregates to free up resources...
Are my expectations incorrect?
Trac comment by lnevers (github user: luisanevers) on 02-10-2014 at 15:39
Right. That fix was only made after the tarball you just tried.
Trac comment by ahelsing on 02-10-2014 at 16:20
The next tarball gcf-2.5-refactor2 (02-11-14) resolves the problem:
08:52:35 INFO stitch.Aggregate:
Stitcher doing createsliver at https://boss.instageni.gpolab.bbn.com:12369/protogeni/xmlrpc/am
08:52:37 ERROR omni: {'output': 'Must delete existing slice first', 'code': {'protogeni_error_log': 'urn:publicid:IDN+instageni.gpolab.bbn.com+log+596ddc878459bd07c1f5b4cc92f3b28f', 'am_type': 'protogeni', 'geni_code': 7, 'am_code': 7, 'protogeni_error_url': 'https://boss.instageni.gpolab.bbn.com/spewlogfile.php3?logfile=596ddc878459bd07c1f5b4cc92f3b28f'}, 'value': 0}
08:52:37 INFO stitch.Aggregate: Got AMAPIError doing createsliver ig-gpo-ig-utah at <Aggregate urn:publicid:IDN+instageni.gpolab.bbn.com+authority+cm>: AMAPIError: Error from Aggregate: code 7. protogeni AM code: 7: Must delete existing slice first.
08:52:37 ERROR stitcher: Stitching failed with an error: Reservation request impossible at <Aggregate urn:publicid:IDN+instageni.gpolab.bbn.com+authority+cm>: You already have a reservation in slice ig-gpo-ig-utah at this aggregate - delete it first or use another aggregate. AMAPIError: Error from Aggregate: code 7. protogeni AM code: 7: Must delete existing slice first....
Reservation request impossible at <Aggregate urn:publicid:IDN+instageni.gpolab.bbn.com+authority+cm>: You already have a reservation in slice ig-gpo-ig-utah at this aggregate - delete it first or use another aggregate. AMAPIError: Error from Aggregate: code 7. protogeni AM code: 7: Must delete existing slice first....
Trac comment by lnevers (github user: luisanevers) on 02-11-2014 at 08:53
This is fixed for PG/IG fixed, but does not exit for MyPLC aggregates:
15:53:59 INFO stitch.Aggregate: Writing to '/tmp/ig-miss-ig-max-createsliver-request-12-dragon-maxgigapop-net.xml'
15:53:59 INFO stitch.Aggregate:
Stitcher doing createsliver at http://max-myplc.dragon.maxgigapop.net:12346
15:54:06 ERROR omni: {'output': ': CreateSliver: Existing record: urn:publicid:IDN+ch.geni.net:ln-prj+slice+ig-miss-ig-max, ', 'geni_api': 2, 'code': {'am_type': 'sfa', 'geni_code': 7, 'am_code': 7}, 'value': ''}
15:54:06 INFO stitch.Aggregate: Got AMAPIError doing createsliver ig-miss-ig-max at
<Aggregate urn:publicid:IDN+dragon.maxgigapop.net+authority+am>: AMAPIError: Error from
Aggregate: code 7. sfa AM code: 7: : CreateSliver: Existing record: urn:publicid:IDN+ch.geni.net:ln-prj+slice+ig-miss-ig-max, .
15:54:06 WARNING stitcher: Stitching failed but will retry: Circuit reservation failed at
<Aggregate urn:publicid:IDN+dragon.maxgigapop.net+authority+am> (AMAPIError: Error from Aggregate:
code 7. sfa AM code: 7: : CreateSliver: Existing record: urn:publicid:IDN+ch.geni.net:ln-prj+slice+ig-miss-ig-max, .).
Try again from the SCS
15:54:06 WARNING stitcher: Had reservation at https://www.instageni.rnet.missouri.edu:12369/protogeni/xmlrpc/am
15:54:06 INFO stitch.Aggregate: Doing deletesliver at https://www.instageni.rnet.missouri.edu:12369/protogeni/xmlrpc/am
15:54:34 ERROR omni: {'output': 'resource is busy; try again later', 'code': {'protogeni_error_log': 'urn:publicid:IDN+instageni.rnet.missouri.edu+log+0493d456bee704974d93a657eac41eb3', 'am_type': 'protogeni', 'geni_code': 14, 'am_code': 14, 'protogeni_error_url': 'https://www.instageni.rnet.missouri.edu/spewlogfile.php3?logfile=0493d456bee704974d93a657eac41eb3'}, 'value': 0}
15:55:59 WARNING stitcher: Deleted reservation at https://www.instageni.rnet.missouri.edu:12369/protogeni/xmlrpc/am
15:55:59 INFO stitcher: Calling SCS for the 2rd time...
15:55:59 INFO stitcher: Pausing for 600 seconds for Aggregates to free up resources...
Trac comment by lnevers (github user: luisanevers) on 02-11-2014 at 16:11
Added that one too.
Trac comment by ahelsing on 02-11-2014 at 22:00
Verified that existing slivers at MAX MyPLC and at I2 ION aggregate causes stitcher to exit, but found some unexpected messages in the output. Here is the output:
09:03:38 INFO stitch.Aggregate: Writing to '/tmp/ig-miss-ig-max-createsliver-request-11-dragon-maxgigapop-net.xml'
09:03:38 INFO stitch.Aggregate: Stitcher doing createsliver at http://max-myplc.dragon.maxgigapop.net:12346
09:03:45 ERROR omni: {'output': ': CreateSliver: Existing record: urn:publicid:IDN+ch.geni.net:ln-prj+slice+ig-miss-ig-max,
', 'geni_api': 2, 'code': {'am_type': 'sfa', 'geni_code': 7, 'am_code': 7}, 'value': ''}
09:03:45 INFO stitch.Aggregate: Got AMAPIError doing createsliver ig-miss-ig-max at
<Aggregate max>: AMAPIError: Error from Aggregate: code 7. sfa AM code: 7: : CreateSliver: Existing record:
urn:publicid:IDN+ch.geni.net:ln-prj+slice+ig-miss-ig-max, .
09:03:45 WARNING stitcher: Stitching failed but will retry: Reservation request impossible at <Aggregate max>:
AMAPIError: Error from Aggregate: code 7. sfa AM code: 7: : CreateSliver: Existing record: urn:publicid:IDN+ch.geni.net:...
09:03:45 INFO stitcher: Calling SCS for the 2rd time...
09:03:45 ERROR stitcher: Root cause error: Reservation request impossible at <Aggregate max>: AMAPIError: Error
from Aggregate: code 7. sfa AM code: 7: : CreateSliver: Existing record: urn:publicid:IDN+ch.geni.net:...
Reservation request impossible at <Aggregate max>: AMAPIError: Error from Aggregate: code 7. sfa AM code: 7: :
CreateSliver: Existing record: urn:publicid:IDN+ch.geni.net:... which caused StitchingServiceFailedError: Error
from Stitching Service: code 3: MxTCE ComputeWorker return error message '#### Action_ProcessRequestTopology_MP2PFinish()
Cannot find the set of paths for the RequestTopology. '.
The stitcher does exit but the message "Action_ProcessRequestTopology_MP2P::Finish() Cannot find the set of paths for the RequestTopology." is unexpected.
Also, did not expect to see a retry message: "09:03:45 INFO stitcher: Calling SCS for the 2rd time..."
Trac comment by lnevers (github user: luisanevers) on 02-12-2014 at 09:26
MAX is an intermediate aggregate - one that you did not explicitly request.
So we give the SCS a chance to find another route to get to the aggregate that you did request. Theoretically, there could be one. And unless I start hard-coding into SCS knowledge of individual aggregates and their topologies, I don't think I can know that in this case there is not.
So you see "but will retry..." and "Calling SCS for the 2rd time..." but that causes the SCS to say it "Cannot find the set of paths" - basically meaning "there's no way to get there if you tell me to avoid MAX".
Trac comment by ahelsing on 02-12-2014 at 09:33
Could you elaborate on when/why it makes sense to look for a different path if the sliver already exists?
The Error/INFO/Warning messages are very verbose and repetitive. Can we simplify the way stitcher reports that it found the sliver exists in the path and that it will look for another path?
The "09:03:45 ERROR omni" and the "09:03:45 INFO stitch.Aggregate" messages are not providing any additional information. Maybe we should only show the "09:03:45 WARNING stitcher" message.
09:03:45 ERROR omni: {'output': ': CreateSliver: Existing record:
urn:publicid:IDN+ch.geni.net:ln-prj+slice+ig-miss-ig-max, ', 'geni_api': 2,
'code': {'am_type': 'sfa', 'geni_code': 7, 'am_code': 7}, 'value': ''}
09:03:45 INFO stitch.Aggregate: Got AMAPIError doing createsliver ig-miss-ig-max
at<Aggregate max>: AMAPIError: Error from Aggregate: code 7. sfa AM code: 7: :
CreateSliver: Existing record: urn:publicid:IDN+ch.geni.net:ln-prj+slice+ig-miss-ig-max, .
09:03:45 WARNING stitcher: Stitching failed but will retry: Reservation request impossible
at <Aggregate max>: AMAPIError: Error from Aggregate: code 7. sfa AM code: 7: : CreateSliver:
Existing record: urn:publicid:IDN+ch.geni.net:...
Trac comment by lnevers (github user: luisanevers) on 02-12-2014 at 10:16
Why look for a different path? Say you are trying to get to non-existent AM MAX2 from PG-Utah. MAX2 has a connection to MAX that connects to ION that connects to PG-Utah. But MAX2 ALSO has a direct connection to PG-Utah. The SCS gave you a path through MAX the first time around, and you got this error. If you go back to the SCS, the SCS should find the direct path from PG-Utah, skipping MAX where your reservation cannot succeed.
So if the topology of the GENI stitching network allows it, there can be a success without using MAX, and going back to the SCS is a good idea.
The ERROR message comes from underlying Omni. It's a little hard to suppress that. But I think I can suppress the INFO message.
Trac comment by ahelsing on 02-12-2014 at 10:49
Replying to ahelsing:
The ERROR message comes from underlying Omni. It's a little hard to suppress that. But I think I can suppress the INFO message.
That's too bad. For the 3 messages from omni, stitch.Aggregate, stitcher that occur for many scenarios in createsliver, the omni message is usually less helpful. Mostly because it always reports as an "ERROR" even for conditions where stitcher can continue and be successful.
Trac comment by lnevers (github user: luisanevers) on 02-12-2014 at 13:04
While trying stitcher I noticed that if it gets an error about a sliver already existing then it will keep retrying although there is no hope in ever succeeding.
If stitcher gets back an error about existing sliver (that was not created by stitcher), stitcher should exit and instruct user to delete their sliver.
To be clear in order to test this, I created a VM out of band in one of the AMs that i used for stitching.
Imported from trac ticket #438, created by nriga on 11-22-2013 at 09:52, last modified: 02-12-2014 at 13:04