Closed OlegGerber closed 4 years ago
HI @OlegGerber This issue came from network connection failed leads deleting vm failed. I will push the product team to accelerate the specified request domain to avoid the network connection. In addition, I will provide a parameter to remove the left vms when deploying a new bosh director. Does it can meet you?
Hi @xiaozhu36,
Can you share more insights on the connectivity issue that you see? what is the "specified request domain" that need to be accelerated?
Thanks, Alex
When you access ecs or other services, the CPI will get an endpoint based on the request region. When region is cn-shanghai, its endpoint is ecs-cn-hangzhou.aliyuncs.com
, and this endpoint will be accelerated when your local host is not in China mainland to avoid some needless network issues.
Hi @xiaozhu36, We are using hosts located in China main land only to access the "ecs-cn-hangzhou.aliyuncs.com" endpoint. maybe the reason for the error is not network in this case?
The local host is not your computer, but the machine that sent the CPI request
yes but we run the bosh director on a VM deployed in China main land.
Can you ping the endpoint ecs-cn-hangzhou.aliyuncs.com in your director? like #76
Hi @xiaozhu36, See the details below: bosh/eff5:~$ nslookup ecs-cn-hangzhou.aliyuncs.com Server: 100.100.2.136 Address: 100.100.2.136#53
ecs-cn-hangzhou.aliyuncs.com canonical name = popunify-public.aliyuncs.com. popunify-public.aliyuncs.com canonical name = popunify-public.aliyuncs.com.gds.alibabadns.com. Name: popunify-public.aliyuncs.com.gds.alibabadns.com Address: 106.11.61.112
PING popunify-public.aliyuncs.com.gds.alibabadns.com (106.11.61.112) 56(84) bytes of data. 64 bytes from 106.11.61.112: icmp_seq=1 ttl=49 time=3.09 ms 64 bytes from 106.11.61.112: icmp_seq=2 ttl=49 time=3.04 ms
Hi OlegGerber, Could you provide a test case so that we can reproduce the issue, please? It is difficult for us to reproduce the issue, appreciate your kindly help.
Hi @lemondlut, This issue is not reproduced on every bosh deployment but we see it frequently, almost every day. Please see the error to understand the flow:
Task 1780
Task 1780 | 04:31:02 | Preparing deployment: Preparing deployment (00:00:01)
Task 1780 | 04:31:03 | Preparing deployment: Rendering templates (00:00:01)
Task 1780 | 04:31:04 | Preparing package compilation: Finding packages to compile (00:00:00)
Task 1780 | 04:31:04 | Updating instance postgresql: postgresql/a372be1a-bf03-4b6f-958e-316d41d10ef5 (0) (canary) (00:11:44)
L Error: CPI error 'Bosh::Clouds::CloudError' with message 'input={"method": "create_vm","arguments": ["025ccff7-c09f-4258-99db-ef746bf23fc0","m-uf6216ctez1mji005iyz",{"availability_zone": "cn-shanghai-f","ephemeral_disk": {"encrypted": true,"size": 20480},"instance_type": "ecs.g5.4xlarge"},{"sf_bosh_services": {"type": "manual","ip": "10.11.104.248","netmask": "255.255.192.0","cloud_properties": {"security_group_ids": ["sg-uf62qvrx5zsy47bfccy3","sg-uf6e1i35512nblhamgvi"],"vswitch_id": "vsw-uf6rudkz8c8d03hfanm5a"},"default": ["dns","gateway"],"dns": ["100.100.2.136","100.100.2.138"],"gateway": "10.11.64.1"}},["d-uf6erxni5w5ce6wl9tsm"],{"bosh": {"blobstores": [{"options": {"endpoint": "https://10.0.3.21:25250","password": "","tls": {"cert": {"ca": "-----BEGIN CERTIFICATE----- ----END RSA PRIVATE KEY-----\n"}},"group": "bosh-sf-service-fabrik-1279-ccbfe480-9d34-4909-9c08-f8a3041d8b9d-postgresql","groups": ["bosh-sf","service-fabrik-1279-ccbfe480-9d34-4909-9c08-f8a3041d8b9d","postgresql","bosh-sf-service-fabrik-1279-ccbfe480-9d34-4909-9c08-f8a3041d8b9d","service-fabrik-1279-ccbfe480-9d34-4909-9c08-f8a3041d8b9d-postgresql","bosh-sf--service-fabrik-1279-ccbfe480-9d34-4909-9c08-f8a3041d8b9d-postgresql"]}}],"context": {"director_uuid": "6fc5522b-8dba-4684-986c-e2c2082f2b8c","request_id": "cpi-102736","vm": {"stemcell": {"api_version": 2}}},"api_version": 1}
change i-uf6e6nqcgk7ddziglw48 to Running failed: change instance i-uf6e6nqcgk7ddziglw48 to Running timeout' in 'create_vm' CPI method (CPI request ID: 'cpi-102736')
.....
Task 1780 Started Mon Jul 1 04:31:02 UTC 2019 Task 1780 Finished Mon Jul 1 04:42:48 UTC 2019 Task 1780 Duration 00:11:46 Task 1780 error
Capturing task '1780' output: Expected task '1780' to succeed but state is 'error'
Exit code 1
2. The second attempt for the same bosh deployment fails with the "ErrorCode: InvalidPrivateIpAddress.Duplicated", which means the VM from the last deployment not deleted from the landscape, despite the fact bosh task failed.
Task 1813
Task 1813 | 05:03:21 | Preparing deployment: Preparing deployment (00:00:01) Task 1813 | 05:03:22 | Preparing deployment: Rendering templates (00:00:01) Task 1813 | 05:03:23 | Preparing package compilation: Finding packages to compile (00:00:00) Task 1813 | 05:03:23 | Creating missing vms: postgresql/a372be1a-bf03-4b6f-958e-316d41d10ef5 (0) (00:03:36) L Error: CPI error 'Bosh::Clouds::CloudError' with message 'input=`{"method": "create_vm","arguments": ["16922489-4467-4fad-ab1a-dad39a49d9b0","m-uf6216ctez1mji005iyz",{"availability_zone": "cn-shanghai-f","ephemeral_disk": {"encrypted": true,"size": 20480},"instance_type": "ecs.g5.4xlarge"},{"sf_bosh_services": {"type": "manual","ip": "10.11.104.248","netmask": "255.255.192.0","cloud_properties": {"security_group_ids": ["sg-uf62qvrx5zsy47bfccy3","sg-uf6e1i35512nblhamgvi"],"vswitch_id": "vsw-uf6rudkz8c8d03hfanm5a"},"default": ["dns","gateway"],"dns": ["100.100.2.136","100.100.2.138"],"gateway": "10.11.64.1"}},["d-uf6erxni5w5ce6wl9tsm"],{"bosh": {"blobstores": [{"options": {"endpoint": "https://10.0.3.21:25250","password": ""," ..... "ResourceOwnerAccount":"","OwnerAccount":"","SystemDiskDiskName":"","RamRoleName":"","DedicatedHostId":"","ClusterId":"","CreditSpecification":"","DataDisk":[{"DiskName":"","SnapshotId":"","Size":"20","Encrypted":"true","Description":"","Category":"cloud_efficiency","KMSKeyId":"","Device":"","DeleteWithInstance":"true"}],"SystemDiskSize":"40","SystemDiskDescription":""} : over max retry: SDK.ServerError ErrorCode: InvalidPrivateIpAddress.Duplicated Recommend: RequestId: DEF2F3D9-010C-4F8F-880C-5C1BA12A20CA Message: Specified private IP address is duplicated.' in 'create_vm' CPI method (CPI request ID: 'cpi-289838') ...
Task 1813 Started Mon Jul 1 05:03:21 UTC 2019 Task 1813 Finished Mon Jul 1 05:06:59 UTC 2019 Task 1813 Duration 00:03:38 Task 1813 error
Capturing task '1813' output: Expected task '1813' to succeed but state is 'error'
Exit code 1
HI @OlegGerber Although, the bosh-alicloud- cpi has added rollback when running create vm
failed, it can not avoid connection timeout
. Now, we have configured global acceleration service for the endpoint ecs-cn-hangzhou.aliyuncs.com and it will avoid the connection timeout error when you access China mainland region service. Besides, I enhanced the rollback and it will retry 10 times when rollback failed. Now, it has been supported by the latest version 24.0.0. Please check it. Thanks.
HI @xiaozhu36 , It seems to be the issue with "InvalidIPAddress.AlreadyUsed" still reproducing on our landscapes during the component update. However, the error message now is: "Message: The specified IPAddress is already used by other resource.' in 'create_vm' CPI method (CPI request ID: 'cpi-465242')"
Task 17061 | 19:40:56 | Compiling packages: elasticsearch/6afebaf8a3637d6e2aae6db61aafaa4abd7655bb28a1c9a25ec9118cb245562e (00:01:43)
Task 17061 | 19:41:53 | Compiling packages: python-3.6/976f472c218622aca411c953bc97d9d47665fd8f (00:02:49)
Task 17061 | 19:41:53 | Compiling packages: curator/29f3fa707032321547ccd95b58846d630eb3ff4bf80de879605796827d602f21 (00:00:10)
Task 17061 | 19:42:22 | Updating instance elasticsearch_master: elasticsearch_master/5133fda1-5ee1-4f65-a852-4d5eba9bbd0a (0) (canary) (00:03:00)
Task 17061 | 19:45:22 | Updating instance elasticsearch_master: elasticsearch_master/d1f98524-bfd1-4c87-894d-b2bd9e72f836 (1) (00:03:13)
Task 17061 | 19:48:35 | Updating instance elasticsearch_master: elasticsearch_master/bfb87525-35ab-451d-a248-b9e8b054eb00 (2) (00:03:26)
Task 17061 | 19:52:01 | Updating instance elasticsearch_data: elasticsearch_data/f4d83e89-6eb2-49f0-a689-54d7166cb5a0 (0) (canary) (00:04:31)
Task 17061 | 19:56:32 | Updating instance elasticsearch_data: elasticsearch_data/05c08b43-65fb-49a7-81b8-3b73e878e992 (1) (00:05:01)
Task 17061 | 20:01:34 | Updating instance zookeeper: zookeeper/3278e6cf-464f-44a3-a553-ef982116db4b (0) (canary) (00:02:45)
Task 17061 | 20:04:19 | Updating instance zookeeper: zookeeper/29eee6d1-f263-4296-adea-e4ebda51d32d (1) (00:04:03)
L Error: CPI error 'Bosh::Clouds::CloudError' with message 'input=`{"method": "create_vm","arguments": ["648af04d-48f0-4a0c-b4a3-51464463a962","m-uf6ff6n8gn1fqfg1i2fu",{"availability_zone": "cn-shanghai-f","ephemeral_disk": {"encrypted": true,"size": 20480},"instance_type": "ecs.c5.large"},{"logsearch": {"type": "manual","ip": "10.2.4.14","netmask": "255.255.0.0","cloud_properties": {"security_group_ids": ["sg-uf6076x9fu1dy4dzka87","sg-uf637q6lqzqr22b9tfw0","sg-uf6fuqggczshqf8tx4nq","sg-uf6aixmrq1fmw1ocjdsy","sg-uf65ns0580xqvcvk21jy","sg-uf6825diaozs5g81gqzm","sg-uf67m2mj8kzatnko0brf"],"vswitch_id": "vsw-uf6wf1cqwuseos3x1dpzd"},"default": ["dns","gateway"],"dns": ["100.100.2.136","100.100.2.138"],"gateway": "10.2.0.1"}},["d-uf60jpji0rlh06c6bvw8"],{"bosh": {"blobstores": [{"options": {"endpoint": "https://10.0.3.11:25250","password": \n"}},"group": "bosh-cf-cn40-staging-logsearch-zookeeper","groups": ["bosh-cf-cn40-staging","logsearch","zookeeper","bosh-cf-cn40-staging-logsearch","logsearch-zookeeper","bosh-cf-cn40-staging-logsearch-zookeeper"]}}],"context": {"director_uuid": "113d6236-e1e5-440d-88cc-f3b9824a934d","request_id": "cpi-465242","vm": {"stemcell": {"api_version": 2}}},"api_version": 1}` create instance failed with input={"Scheme":"HTTPS","Method":"POST","Domain":"ecs-cn-hangzhou.aliyuncs.com","Port":"","RegionId":"","ReadTimeout":0,"ConnectTimeout":0,"AcceptFormat":"JSON","QueryParams":{"AccessKeyId":"","Action":"CreateInstance","ClientToken":"Bosh-Cpi-CreateInstance-1564689889-284eb0e6-12c8-4b57-9c70-7e8b2","DataDisk.1.Category":"cloud_efficiency","DataDisk.1.DeleteWithInstance":"true","DataDisk.1.Encrypted":"true","DataDisk.1.Size":"20","Format":"JSON","ImageId":"m-uf6ff6n8gn1fqfg1i2fu","InstanceChargeType":"PostPaid","InstanceType":"ecs.c5.large","IoOptimized":"optimized","PrivateIpAddress":"10.2.4.14","RegionId":"cn-shanghai","SecurityGroupId":"sg-uf6076x9fu1dy4dzka87","SecurityToken":"","Signature":"=","SignatureMethod":"HMAC-SHA1","SignatureNonce":"945d320b331d47c19b68ff14a365c219","SignatureType":"","SignatureVersion":"1.0","SpotPriceLimit":"0.000000","SystemDisk.Category":"cloud_efficiency","SystemDisk.Size":"40","Timestamp":"2019-08-01T20:08:21Z","UserData":"=","VSwitchId":"vsw-uf6wf1cqwuseos3x1dpzd","Version":"2014-05-26","ZoneId":"cn-shanghai-f"},"Headers":{"Accept-Encoding":"identity","Content-Type":"application/x-www-form-urlencoded","x-sdk-client":"golang/1.0.0","x-sdk-core-version":"0.0.1","x-sdk-invoke-type":"normal"},"FormParams":{},"Content":"","ResourceOwnerId":"","HpcClusterId":"","SecurityEnhancementStrategy":"","KeyPairName":"","SpotPriceLimit":"0.000000","DeletionProtection":"","ResourceGroupId":"","HostName":"","Password":"","Tag":null,"AutoRenewPeriod":"","NodeControllerId":"","Period":"","DryRun":"","OwnerId":"","CapacityReservationPreference":"","VSwitchId":"vsw-uf6wf1cqwuseos3x1dpzd","PrivateIpAddress":"10.2.4.14","SpotStrategy":"","PeriodUnit":"","InstanceName":"","AutoRenew":"","InternetChargeType":"","ZoneId":"cn-shanghai-f","InternetMaxBandwidthIn":"","UseAdditionalService":"","ImageId":"m-uf6ff6n8gn1fqfg1i2fu","ClientToken":"Bosh-Cpi-CreateInstance-1564689889-284eb0e6-12c8-4b57-9c70-7e8b2","VlanId":"","SpotInterruptionBehavior":"","IoOptimized":"optimized","SecurityGroupId":"sg-uf6076x9fu1dy4dzka87","InternetMaxBandwidthOut":"","Description":"","SystemDiskCategory":"cloud_efficiency","CapacityReservationId":"","UserData":"=","":"","InstanceType":"ecs.c5.large","InstanceChargeType":"PostPaid","DeploymentSetId":"","InnerIpAddress":"","ResourceOwnerAccount":"","OwnerAccount":"","SystemDiskDiskName":"","RamRoleName":"","DedicatedHostId":"","ClusterId":"","CreditSpecification":"","DataDisk":[{"DiskName":"","SnapshotId":"","Size":"20","Encrypted":"true","Description":"","Category":"cloud_efficiency","KMSKeyId":"","Device":"","DeleteWithInstance":"true"}],"SystemDiskSize":"40","SystemDiskDescription":""} : over max retry: SDK.ServerError
ErrorCode: InvalidIPAddress.AlreadyUsed
Recommend:
RequestId: 9B9AA73D-EFDC-49FF-B42C-2732F6C1CA25
Message: The specified IPAddress is already used by other resource.' in 'create_vm' CPI method (CPI request ID: 'cpi-465242')
HI @OlegGerber Your private Ip 10.2.4.14 has been used by the instance i-uf64zjiazflkqpzqtvtl
which created in 2019-08-02 14:46:42
. Can you check it or your manifest?
Hi @xiaozhu36 , instance ID:i-uf64zjiazflkqpzqtvtl belong to the bosh instance zookeeper.29eee6d1-f263-4296-adea-e4ebda51d32d, which is failed on the bosh update process.
HI @OlegGerber Why did the bosh update process failed? What is your update command
? Did the failure come from bosh cpi?
Hi @xiaozhu36 , Bosh update process failed on duplicate private IP (which is in use by VM itself) The update is a regular bosh update procedure, which runs as a result of any manifest changes or stemcell update. During the update, old VM may be deleted and a new one created, it seems to be that once VM deleted, it takes time to release the private IP. Few retries fix the issue. Can you please check if it may be the reason for this kind of failures?
HI @OlegGerber You are right and I agree with you. But, actually, the CPI has added the retry strategy (times: 20, interval 10s) for this error: https://github.com/cloudfoundry-incubator/bosh-alicloud-cpi-release/blob/master/src/bosh-alicloud-cpi/alicloud/instance_manager.go#L100. Did you use the latest bosh cpi?
Hi @OlegGerber Is there any other error before the error InvalidIPAddress.AlreadyUsed
? There is one case: connection timeout or request timeout results in the specified instance has been created, but bosh return a timeout error. When you retry bosh command, the IP has been used by the last VM.
HI @OlegGerber The latest cpi release 25.0.0 has been published and it has enlarged the retry times for the error InvalidIPAddress.AlreadyUsed
. Please check it.
HI @OlegGerber The CPI release 26.0.0 has been published and it has the following improvements:
enlarge the timeout to 10 minutes when waiting for instance status, like running, deleted and so on In some extreme case, creating vm may be cost too more time, so enlarging waiting time can avoid some internal error. If there is also timeout, the cpi will rollback and delete the vm. In the rollback, enlarging waiting time can ensure the vm can be deleted successfully. It depends on backend system will kill the vm forcibly after 5 minutes and remove it.
add one method to cleanup the remaining network interface which occupies the vm's private ip
In some extreme case, if one vm is deleted in a non-normal ways, the vm's primarily network interface may be left and it will occupy the vm 's private ip. It also will throw an error InvalidIPAddress.AlreadyUsed
when recreating a new vm. So, after deleting the instance, check and cleanup the remaining network interface is necessary.
add extra error message when an error happened and it can help us to find the error reason more easily and quickly.
Please use the release 26.0.0 to have a try.
Hi @xiaozhu36 , Unfortunately, this issue reproduced again with version: "26.0.0"
stemcells:
+ - alias: default
+ os: ubuntu-xenial
+ version: '456.16'
- - alias: default
- os: ubuntu-xenial
- version: '456.12'
Task 3596
Task 3596 | 02:05:35 | Preparing deployment: Preparing deployment (00:00:16)
Task 3596 | 02:05:52 | Preparing deployment: Rendering templates (00:00:24)
Task 3596 | 02:06:21 | Preparing package compilation: Finding packages to compile (00:00:01)
Task 3596 | 02:06:24 | Updating instance adapter: adapter/5d37d36c-0e80-410d-9d03-01c81118b4cd (1) (canary) (00:03:56)
Task 3596 | 02:10:20 | Updating instance diego-api: diego-api/ea07b6fe-43d2-45a1-938b-51f00e407b9f (0) (canary) (00:03:19)
Task 3596 | 02:13:39 | Updating instance diego-api: diego-api/7215fb64-8992-4c54-a9fd-d30b95980321 (1) (00:03:17)
Task 3596 | 02:16:56 | Updating instance uaa: uaa/4906ce75-ddbf-4d7c-82cd-79dad44ddec0 (0) (canary) (00:04:10)
Task 3596 | 02:21:06 | Updating instance uaa: uaa/1605c073-986e-4434-a6ea-9e6e0128bb14 (1) (00:04:46)
Task 3596 | 02:25:52 | Updating instance bits: bits/9efbba78-bf85-47e8-bcbe-7ddaaa69e7a9 (0) (canary) (00:03:10)
Task 3596 | 02:29:02 | Updating instance bits: bits/0e294baf-b909-45d1-bdfa-4d772f446a88 (1) (00:03:04)
Task 3596 | 02:32:06 | Updating instance cc-worker: cc-worker/cc4d3f03-7918-4d05-b486-04176d1444e7 (0) (canary)
Task 3596 | 02:32:06 | Updating instance api: api/b1d2f599-f29b-4c4f-9c3f-5c85fc2c89f1 (0) (canary)
Task 3596 | 02:35:41 | Updating instance cc-worker: cc-worker/cc4d3f03-7918-4d05-b486-04176d1444e7 (0) (canary) (00:03:35)
Task 3596 | 02:35:41 | Updating instance cc-worker: cc-worker/01a626a2-0efd-4607-ac78-b148230d95ec (1)
Task 3596 | 02:36:44 | Updating instance api: api/b1d2f599-f29b-4c4f-9c3f-5c85fc2c89f1 (0) (canary) (00:04:38)
Task 3596 | 02:36:44 | Updating instance api: api/e5e8416c-6afd-45a1-80cf-2538966f2130 (2)
Task 3596 | 02:38:56 | Updating instance cc-worker: cc-worker/01a626a2-0efd-4607-ac78-b148230d95ec (1) (00:03:15)
Task 3596 | 02:40:50 | Updating instance api: api/e5e8416c-6afd-45a1-80cf-2538966f2130 (2) (00:04:06)
Task 3596 | 02:40:50 | Updating instance api: api/14f3a6b5-ebcb-47a4-b74c-07261d602ade (3) (00:04:05)
Task 3596 | 02:44:55 | Updating instance api: api/7353dd97-8e3d-4123-8791-ac3653594767 (1) (00:04:28)
Task 3596 | 02:49:23 | Updating instance scheduler: scheduler/ab98aec9-af7f-4c8d-ac2a-0e36a0d4846c (0) (canary) (00:06:02)
L Error: CPI error 'Bosh::Clouds::CloudError' with message 'input=`{"method": "create_vm","arguments": ["3f225874-fc19-456a-8e79-5c003aaaedac","m-uf639zkz5hy5p22rz231",{"availability_zone": "cn-shanghai-e","ephemeral_disk": {"encrypted": true,"size": 20480},"instance_type": "ecs.c5.xlarge"},{"cf": {"type": "manual","ip": "10.1.1.71","netmask": "255.255.0.0","cloud_properties": {"security_group_ids": ["sg-uf68jc3nwxdjnixbvrw7","sg-uf66oinxyjtague1d1ox","sg-uf65ns0580xqvcvk21jy","sg-uf66oinxyjtague1d1ot","sg-uf6fuqggczshqh7v16ng","sg-uf6825diaozs5g81gqzm","sg-uf67m2mj8kzatnko0brf"],"vswitch_id": "vsw-uf6didcjh497z4wa8bpkc"},"default": ["dns","gateway"],"dns": ["100.100.2.136","100.100.2.138"],"gateway": "10.1.0.1"}},[],{"bosh": {"blobstores": [{"options": {"endpoint": "https://10.0.3.11:25250",","group": "bosh-cf-cn40-staging-cf-scheduler","groups": ["bosh-cf-cn40-staging","cf","scheduler","bosh-cf-cn40-staging-cf","cf-scheduler","bosh-cf-cn40-staging-cf-scheduler"]}}],"context": {"director_uuid": "e9a039b7-0a89-4d52-a3fd-71a8dc125170","request_id": "cpi-198350","vm": {"stemcell": {"api_version": 2}}},"api_version": 1}` create instance failed with input={"Scheme":"HTTPS","Method":"POST","Domain":"ecs-cn-hangzhou.aliyuncs.com","Port":"","RegionId":"","ReadTimeout":0,"ConnectTimeout":0,"AcceptFormat":"JSON","QueryParams":{"AccessKeyId":"LTAIPdGJTly9Iyo3","Action":"CreateInstance","ClientToken":"Bosh-Cpi-CreateInstance-1567478989-6177f87c-0084-4128-969f-85523","DataDisk.1.Category":"cloud_efficiency","DataDisk.1.DeleteWithInstance":"true","DataDisk.1.Encrypted":"true","DataDisk.1.Size":"20","Format":"JSON","ImageId":"m-uf639zkz5hy5p22rz231","InstanceChargeType":"PostPaid","InstanceType":"ecs.c5.xlarge","IoOptimized":"optimized","PrivateIpAddress":"10.1.1.71","RegionId":"cn-shanghai","SecurityGroupId":"sg-uf68jc3nwxdjnixbvrw7","SecurityToken":"","=","SignatureMethod":"HMAC-SHA1","SignatureNonce":"","SignatureType":"","SignatureVersion":"1.0","SpotPriceLimit":"0.000000","SystemDisk.Category":"cloud_efficiency","SystemDisk.Size":"40","Timestamp":"2019-09-03T02:55:25Z","UserData":""VSwitchId":"vsw-uf6didcjh497z4wa8bpkc","Version":"2014-05-26","ZoneId":"cn-shanghai-e"},"Headers":{"Accept-Encoding":"identity","Content-Type":"application/x-www-form-urlencoded","x-sdk-client":"golang/1.0.0","x-sdk-core-version":"0.0.1","x-sdk-invoke-type":"normal"},"FormParams":{},"Content":"","ResourceOwnerId":"","HpcClusterId":"","SecurityEnhancementStrategy":"","KeyPairName":"","SpotPriceLimit":"0.000000","DeletionProtection":"","ResourceGroupId":"","HostName":"","":"","StorageSetPartitionNumber":"","Tag":null,"AutoRenewPeriod":"","NodeControllerId":"","Period":"","DryRun":"","OwnerId":"","CapacityReservationPreference":"","VSwitchId":"vsw-uf6didcjh497z4wa8bpkc","PrivateIpAddress":"10.1.1.71","SpotStrategy":"","PeriodUnit":"","InstanceName":"","AutoRenew":"","InternetChargeType":"","ZoneId":"cn-shanghai-e","InternetMaxBandwidthIn":"","UseAdditionalService":"","Affinity":"","ImageId":"m-","ClientToken":"Bosh-Cpi-CreateInstance-1567478989-6177f87c-0084-4128-969f-85523","VlanId":"","SpotInterruptionBehavior":"","IoOptimized":"optimized","SecurityGroupId":"sg-uf68jc3nwxdjnixbvrw7","InternetMaxBandwidthOut":"","Description":"","SystemDiskCategory":"cloud_efficiency","CapacityReservationId":"","SystemDiskPerformanceLevel":"","UserData":"":"","InstanceType":"ecs.c5.xlarge","Arn":null,"InstanceChargeType":"PostPaid","DeploymentSetId":"","InnerIpAddress":"","ResourceOwnerAccount":"","OwnerAccount":"","Tenancy":"","SystemDiskDiskName":"","RamRoleName":"","DedicatedHostId":"","ClusterId":"","CreditSpecification":"","DataDisk":[{"DiskName":"","SnapshotId":"","Size":"20","Encrypted":"true","PerformanceLevel":"","Description":"","Category":"cloud_efficiency","KMSKeyId":"","Device":"","DeleteWithInstance":"true"}],"StorageSetId":"","SystemDiskSize":"40","SystemDiskDescription":""} : over max retry: SDK.ServerError
ErrorCode: InvalidIPAddress.AlreadyUsed
Recommend:
As discussed in the call right now, @xiaozhu36 will have a deep-diving here to solve it. He may bother you later @OlegGerber .
Do we have any progress, we are using version 27, still has this error.
Fixed by version v32.0.0. Please check it.
Hi, It looks like bosh did not cleanup the resources on create_vm failures. create-vm failed on timeout + failed delete, and second attempt to deploy the same VM failed on "private IP address is duplicated" error. CPI Release Version: v22
Please note I removed the password and keys from the logs/ First attempt: {code} Deploying:
Creating instance 'bosh/0':
Exit code 1 {code}
Second attempt to deploy the VM: {code}
Uploading stemcell 'bosh-alicloud-kvm-ubuntu-xenial-go_agent/250.23'... Skipped [Stemcell already uploaded] (00:00:00)
Started deploying
Creating VM for instance 'bosh/0' from stemcell 'm-uf6b6vtpujfo9zq2104i'... Failed (00:03:47)
Failed deploying (00:03:47)
Stopping registry... Finished (00:00:00)
Cleaning up rendered CPI jobs... Finished (00:00:00)
Deploying:
Creating instance 'bosh/0':
Exit code 1
{code}