Closed ibaldin closed 7 years ago
I can't seem to get the Modify to work for this sort of request.
Is this what the Increase Node Group Size...
option is supposed to look like?
If I put 20 in that box, and then Submit Changes (and Poll/Query Manifest), my slice doesn't look like it has changed it all. It is still just two VMs connected by a broadcast link.
Could someone give me a Request and a Modify RDF?
You need to start with a node group
Actually I see you did already. Try starting with nodegroups of size > 1
I suspect there may be a bug related to converting a nodegroup into a single node.
I still get the same thing, starting with Node Groups of size 2. (I think Mert's original was with Node Groups of size 1).
It may be broken then
Modify might be failing on simple VM additions too.
This might be the exception:
controller.log.6:2017-05-19 15:05:41,945 [qtp1574943246-34 - /orca/xmlrpc] ERROR controller.OrcaXmlrpcHandler - getSliceManifest(): converter unable to get manifest: java.lang.IllegalArgumentException: Model is a null pointer
controller.log.6:2017-05-19 15:05:41,946 [qtp1574943246-34 - /orca/xmlrpc] ERROR controller.OrcaXmlrpcHandler - getSliceManifest(): Exception encountered: OrcaControllerException: ERROR: Failed due to exception: java.lang.IllegalArgumentException: Model is a null pointer
controller.log.6:2017-05-19 15:05:41,946 [qtp1574943246-34 - /orca/xmlrpc] ERROR controller.OrcaXmlrpcHandler - sliceStatus(): ControllerException: OrcaControllerException: ERROR: Exception encountered: orca.controllers.OrcaControllerException: OrcaControllerException: ERROR: Failed due to exception: java.lang.IllegalArgumentException: Model is a null pointer
OK, a modify request that is a simple VM addition still works. The above errors seem to be unrelated, and previous modify requests were only failing because of a problem local to UFL.
Will investigate the Increase Node Group Size bug next week.
The modify is not occurring because IP Address information seems to be missing, causing this NPE:
java.lang.NullPointerException
at orca.embed.cloudembed.MappingHandler.getIPRange(MappingHandler.java:199)
at orca.embed.cloudembed.controller.ModifyHandler.addElements(ModifyHandler.java:459)
at orca.embed.cloudembed.controller.ModifyHandler.modifySlice(ModifyHandler.java:139)
at orca.embed.workflow.RequestWorkflow.modify(RequestWorkflow.java:249)
at orca.controllers.xmlrpc.OrcaXmlrpcHandler.modifySlice(OrcaXmlrpcHandler.java:594)
Will return to investigate this ticket, related to TicketReview, after #137 is resolved.
There seem to be a couple of things going on with this ticket.
There is one relatively easy fix, which is to make sure that the Controller adds the core constraints / requested resources (e.g. Num CPU) to the request. This allows the SM to verify the actual availability of resources, producing expected results.
A secondary issue is that it seems like the Controller doesn't verify available resources at all for this type of Modify request (NodeGroup Increases). We should probably fix that? This is possibly slightly more difficult, because I don't immediately see where to plug that in. And, in a sense, we have acknowledged that the Controller is not always going to have accurate counts of available resources, so maybe it's not that bad that the request will make it to the SM before it fails?
Thoughts?
@YufengXin thoughts on whether the Controller should be checking the available resource count for NodeGroup increases? Or should we just let the SM handle it? (See previous comment)
As reported by @mcevik0
Cause TicketReview closure on Modify request
Slice is active with 22 (2+20) XOXlarge VMs requested through rack-controller. Rack has 52 cores delegated to ndl-broker, 52 cores for wvn-broker. No more than 13 XOXlarge VMs should have been requested.
Same happens when ExoSM is used.
First identified in RENCI-NRIG/exogeni#125