batrick / ceph-linode

Launch Ceph using the Linode VPS provider
GNU General Public License v3.0
13 stars 10 forks source link

partial deployment of large linode clusters #36

Open bengland2 opened 5 years ago

bengland2 commented 5 years ago

I've been experimenting with some larger linode clusters, and what happens is that sometimes the Linode API rejects a node creation with an error like the one shown at the bottom. I think it means that there is no room at the inn, linode just doesn't have resources at that geographic site to create that many VMs.

My complaint is that this results in a set of linodes that are created but aren't in the display group, so that linode-destroy.py won't clean them up and I have to do this by hand. Sometimes this set can be pretty large. If one linode create fails, the other threads in the pool are aborted before they can add their new linodes to the display group (the string in LINODE_GROUP), since that is a separate call to the linode API.

Is there any way to change linode-launch.py so that a linode isn't created unless it is also added to the group? Cleanup would be simple then - just run linode-destroy.py

2018-08-30 14:48:05,067 DEBUG Raw Response: {"ACTION":"linode.create","DATA":{"LinodeID":9963999},"ERRORARRAY":[]}
2018-08-30 14:48:05,068 DEBUG Parameters {'linodeid': 9963999, 'alert_cpu_enabled': 0, 'label': u'ceph-d5d1b9-mds-001', 'api_responseformat': 'json', 'api_action': 'linode.update', 'watchdog': 1, 'api_key': 'api_key: xxxx REDACTED xxxx', 'lpm_displaygroup': u'ceph-d5d1b9'}
2018-08-30 14:48:05,097 DEBUG Raw Response: {"ACTION":"linode.create","DATA":{},"ERRORARRAY":[{"ERRORMESSAGE":"No open slots for this plan!","ERRORCODE":8}]}
2018-08-30 14:48:05,098 ERROR [{u'ERRORCODE': 8, u'ERRORMESSAGE': u'No open slots for this plan!'}]
Traceback (most recent call last):
  File "./linode-launch.py", line 133, in create
    do_create(*args, **kwargs)
  File "./linode-launch.py", line 63, in do_create
    node = client.linode_create(DatacenterID = datacenter, PlanID = plan[u'PLANID'], PaymentTerm = 1)
  File "/root/ceph-linode/linode-env/lib/python2.7/site-packages/linode/api.py", line 340, in wrapper
    return self.__send_request(request)
  File "/root/ceph-linode/linode-env/lib/python2.7/site-packages/linode/api.py", line 294, in __send_request
    raise ApiError(s['ERRORARRAY'])
ApiError: [{u'ERRORCODE': 8, u'ERRORMESSAGE': u'No open slots for this plan!'}]
bengland2 commented 5 years ago

lowering multiprocessing.dummy.Pool size from 50 to 10 seemed to work, am going to wait a while to see if it works consistently, but it was failing consistently with 80-node linode deployments. Doesn't seem to take much longer to get them running this way.

bengland2 commented 5 years ago

I still sometimes get "No open slots for this plan!" but at least there were only 2 linodes not associated with the group, so cleanup was easier. Arghh. Wish there was a way to ask for a set of linodes and either get all of them or none of them. Only way I can see to do this is to do all the creates, and if they all succeed, then do everything else, else delete everything you created right away so you don't get billed for it. Am I missing something?

batrick commented 5 years ago

I think it should be possible to catch the exception and still apply the label so cleanup is easier. I wonder if retrying linode-launch.py after a minute or so may allow for more availability to be created.

OTOH, this seems likea new problem with Linode and worth opening a ticket about. They may be doing some kind of new throttling that is unintentionally messing this up.