TritonDataCenter / sdc-docker

Docker Engine for Triton
Mozilla Public License 2.0
182 stars 49 forks source link

Some suspected failings when a CN falls over #84

Open magnayn opened 8 years ago

magnayn commented 8 years ago

I had a CN fall over, and this seemed to cause docker provisioning to fail:

docker run -ti ubuntu /bin/bash Error response from daemon: (DockerNoComputeResourcesError) No compute resources available. (256b0080-34a0-11e6-8bb7-0d35b3585d39)

When trying to update things just to check, it also failed :

Reprovisioning 3a628bd5-5594-4ae9-92df-5557d38a7a96 (hostvolume-LGW) inst to image cfa18754-2e06-11e6-80c0-f71606342a60 sdcadm experimental: error: sapi client error: socket hang up

I destroyed the server in adminui. Docker provisioning now works - hwever updating still doesn't

[root@headnode (Osney) ~]# sdcadm experimental update-docker --servers cns "docker" VM already has a delegate dataset Reprovisioning 3a628bd5-5594-4ae9-92df-5557d38a7a96 (hostvolume-LGW) inst to image cfa18754-2e06-11e6-80c0-f71606342a60 sdcadm experimental: error: sapi client error (ReprovisionFailedError): Server 44454c4c-4200-1039-8036-b1c04f345831 not found

Feels like there is some behaviour that assumes servers are always alive/up.

kusor commented 8 years ago

Hi @magnayn, which sdcadm version are you using?. Hostvolumes are not needed anymore and should be removed by sdcadm experimental update-other using a recent version of sdcadm. The update of docker should be done just with sdcadm update docker.

magnayn commented 8 years ago

[root@headnode (Osney) ~]# sdcadm --version sdcadm 1.11.1 (release-20160428-20160428T183310Z-g04ea412)

doing a selfupdate to 1.11.2 fixes it. D'oh!

On Fri, Jun 17, 2016 at 5:01 PM, Pedro Palazón Candel < notifications@github.com> wrote:

Hi @magnayn https://github.com/magnayn, which sdcadm version are you using?. Hostvolumes are not needed anymore and should be removed by sdcadm experimental update-other using a recent version of sdcadm. The update of docker should be done just with sdcadm update docker.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/joyent/sdc-docker/issues/84#issuecomment-226809628, or mute the thread https://github.com/notifications/unsubscribe/AADRleYwStGposvoSLNv4Ac3MpTHqf9eks5qMsTGgaJpZM4I4fhS .

magnayn commented 8 years ago

hm - that said, after upgrading and sdcadm post-setup docker :

[root@headnode (Osney) ~]# sdcadm update docker

/opt/smartdc/sdcadm/lib/sdcadm.js:720 vm.server_uuid].hostname; ^ TypeError: Cannot read property 'hostname' of undefined at Object.fillOutVmInsts as func at Object._onImmediate (/opt/smartdc/sdcadm/node_modules/vasync/lib/vasync.js:213:20) at processImmediate as _immediateCallback

kusor commented 8 years ago

I think that's b/c you removed the server from AdminUI w/o deleting the instances from SAPI. Mind to tell me what's the output of:

sdc-sapi /instances?service_uuid=$(sdc-sapi /services?name=hostvolume|json -H 0.uuid) | json -H
kusor commented 8 years ago

Also, same thing for docker service, please:

sdc-sapi /instances?service_uuid=$(sdc-sapi /services?name=docker|json -H 0.uuid) | json -H
magnayn commented 8 years ago

Ah - ok - I wasn't aware I needed to do non-adminui stuff

[root@headnode (Osney) ~]# sdc-sapi /instances?service_uuid=$(sdc-sapi /services?name=hostvolume|json -H 0.uuid) | json -H [ { "uuid": "2d011ac4-45dd-4935-836c-310762cf2e2a", "service_uuid": "a34730e9-ca7c-47ab-b1bc-b7f5fdd42a39", "params": { "alias": "hostvolume-headnode", "server_uuid": "44454c4c-3400-1058-8038-b4c04f365831" }, "type": "vm" }, { "uuid": "3a628bd5-5594-4ae9-92df-5557d38a7a96", "service_uuid": "a34730e9-ca7c-47ab-b1bc-b7f5fdd42a39", "params": { "alias": "hostvolume-LGW", "server_uuid": "44454c4c-4200-1039-8036-b1c04f345831" }, "type": "vm" }, { "uuid": "2873f5ac-6dd1-4750-9e9f-667e3e23d41d", "service_uuid": "a34730e9-ca7c-47ab-b1bc-b7f5fdd42a39", "params": { "alias": "hostvolume-SFO", "server_uuid": "44454c4c-5700-1043-8033-c2c04f4c5631" }, "type": "vm" }, { "uuid": "9b67fab4-4830-4bde-91e1-f33e6c2d2946", "service_uuid": "a34730e9-ca7c-47ab-b1bc-b7f5fdd42a39", "params": { "alias": "hostvolume-JFK", "server_uuid": "35383339-3637-435a-3231-323930303747" }, "type": "vm" }, { "uuid": "34a1a9e8-4bc0-4915-9b19-8ad1a40fba3c", "service_uuid": "a34730e9-ca7c-47ab-b1bc-b7f5fdd42a39", "params": { "alias": "hostvolume-LHR", "server_uuid": "44454c4c-4c00-104d-804e-c6c04f46354a" }, "type": "vm" } ]

magnayn commented 8 years ago

[root@headnode (Osney) ~]# sdc-sapi /instances?service_uuid=$(sdc-sapi /services?name=docker|json -H 0.uuid) | json -H [ { "uuid": "da61b9d5-75b1-4f6f-8cd5-973c2f11d8b8", "service_uuid": "9b7490de-1e13-4d93-a8c6-01cb23938fb0", "params": { "alias": "docker0", "delegate_dataset": true, "server_uuid": "44454c4c-3400-1058-8038-b4c04f365831" }, "type": "vm" } ]

kusor commented 8 years ago

Ok, run: sdcadm experimental update-other in order to get rid of the hostvolume instances - or, at least, to see if anything there is failing.

Then, sdc-cnapi /servers/44454c4c-3400-1058-8038-b4c04f365831 to see what's the status of that server, please

magnayn commented 8 years ago

Hmm - there is an error recorded in the update

[root@headnode (Osney) ~]# sdcadm experimental update-other Update "sdc" SAPI app metadata_schema Set "docker" service "metadata.SERVICE_DOMAIN" Adding domain keys to "sdc" SAPI app metadata: {"DOCKER_SERVICE":" docker.Osney.allocatesoftware.com","docker_domain":" docker.Osney.allocatesoftware.com"} Running VMAPI migrations Removing deprecated hostvolume instances

sdcadm experimental: error: socket hang up

[root@headnode (Osney) ~]# sdc-cnapi /servers/44454c4c-3400-1058-8038-b4c04f365831 HTTP/1.1 200 OK Content-Type: application/json Content-Length: 12764 Date: Fri, 17 Jun 2016 16:19:27 GMT Server: Compute Node API x-request-id: e8afbbb7-7a15-4e7c-8279-77c0603e309a x-response-time: 8 x-server-name: ae76109a-80a0-4a15-a1c4-4cd7ec84858b Connection: keep-alive

{ "agents": [ { "name": "firewaller", "version": "1.4.0", "image_uuid": "318eb24f-e44a-4271-99fa-14ba1b397c32" }, { "name": "cainstsvc", "version": "0.0.3vrelease-20160428-20160428T180942Z-gc110a12", "image_uuid": "7625072f-8701-4cb5-b014-65d57f1e39e6" }, { "name": "hagfish-watcher", "version": "1.1.0-release-20160428-20160428T183307Z-geb1d34a", "image_uuid": "10609cb7-244c-47fd-9642-35d2ab04199d" }, { "name": "cabase", "version": "1.0.3vrelease-20160428-20160428T180942Z-gc110a12", "image_uuid": "c910dd54-d6dd-4144-bd78-5b38b70b7ba6" }, { "name": "marlin", "version": "0.0.3", "image_uuid": "6a476594-3f7c-41ed-8be9-78aa17111642" }, { "name": "smartlogin", "version": "0.3.0-release-20160428-20160428T183259Z-g381e99f", "image_uuid": "3ef411da-e6cd-46b3-b920-bc2599060af3" }, { "name": "config-agent", "version": "1.5.0", "image_uuid": "b782f289-9f98-4488-a047-060969410b16" }, { "name": "net-agent", "version": "1.3.0", "image_uuid": "cd375ca4-5e14-40e5-9264-a2ad68029021", "uuid": "6e79bed7-b019-4c3e-bab1-ca4e7470aa28" }, { "name": "agents_core", "version": "2.1.0", "image_uuid": "d4b784f1-3d7c-4ece-a3d7-4ddcf6c4c8b0" }, { "name": "cn-agent", "version": "1.5.2", "image_uuid": "a3900067-98ff-4288-9dc4-0835f1d8f767", "uuid": "377e9319-3225-470c-8731-07b982e52f9a" }, { "name": "amon-relay", "version": "1.0.1", "image_uuid": "30ded3e7-ea50-4637-8902-23e18903eba6" }, { "name": "vm-agent", "version": "1.5.0", "image_uuid": "fec9f401-5921-4025-a555-8d6f7cff8da1", "uuid": "a1dc76a3-f87a-4da1-bde4-c960bd966cab" }, { "name": "amon-agent", "version": "1.0.1", "image_uuid": "27e19fa9-d693-4d28-aaa0-d6057dfec8b1" } ], "datacenter": "Osney", "overprovision_ratio": 1, "reservation_ratio": 0.15, "reservoir": false, "traits": {}, "rack_identifier": "", "comments": "", "uuid": "44454c4c-3400-1058-8038-b4c04f365831", "reserved": false, "vms": { "91d346f7-f7c1-4688-bf3c-0c8235ec6ffd": { "uuid": "91d346f7-f7c1-4688-bf3c-0c8235ec6ffd", "owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853", "quota": 25, "max_physical_memory": 128, "zone_state": "running", "state": "running", "brand": "joyent-minimal", "cpu_cap": 100, "last_modified": "2016-06-17T15:14:19.000Z" }, "ae2ad258-1a3c-4e81-a4e8-be92a31debe6": { "uuid": "ae2ad258-1a3c-4e81-a4e8-be92a31debe6", "owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853", "quota": 25, "max_physical_memory": 512, "zone_state": "running", "state": "running", "brand": "joyent-minimal", "cpu_cap": 200, "last_modified": "2016-06-17T15:14:19.000Z" }, "0940ef38-3a34-46de-bed7-e5ec9cf443a0": { "uuid": "0940ef38-3a34-46de-bed7-e5ec9cf443a0", "owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853", "quota": 25, "max_physical_memory": 1024, "zone_state": "running", "state": "running", "brand": "joyent-minimal", "cpu_cap": 300, "last_modified": "2016-06-17T15:14:19.000Z" }, "f8054311-3c83-4253-8db8-86d0792c7a86": { "uuid": "f8054311-3c83-4253-8db8-86d0792c7a86", "owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853", "quota": 50, "max_physical_memory": 2048, "zone_state": "running", "state": "running", "brand": "joyent-minimal", "cpu_cap": 400, "last_modified": "2016-06-17T15:14:19.000Z" }, "a27c54e5-d36d-40bb-84a6-9eeed610d904": { "uuid": "a27c54e5-d36d-40bb-84a6-9eeed610d904", "owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853", "quota": 25, "max_physical_memory": 8192, "zone_state": "running", "state": "running", "brand": "joyent-minimal", "cpu_cap": 400, "last_modified": "2016-06-17T15:14:19.000Z" }, "59bc2cae-25c1-4c30-b20f-27719f5673b1": { "uuid": "59bc2cae-25c1-4c30-b20f-27719f5673b1", "owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853", "quota": 25, "max_physical_memory": 1024, "zone_state": "running", "state": "running", "brand": "joyent-minimal", "cpu_cap": 300, "last_modified": "2016-06-17T15:14:19.000Z" }, "03687fd1-bdfa-48d7-8054-b84fbc3ce233": { "uuid": "03687fd1-bdfa-48d7-8054-b84fbc3ce233", "owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853", "quota": 25, "max_physical_memory": 1024, "zone_state": "running", "state": "running", "brand": "joyent-minimal", "cpu_cap": 300, "last_modified": "2016-06-17T15:14:19.000Z" }, "9bb0607c-6829-490a-8f02-81555471cb3b": { "uuid": "9bb0607c-6829-490a-8f02-81555471cb3b", "owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853", "quota": 25, "max_physical_memory": 8192, "zone_state": "running", "state": "running", "brand": "joyent-minimal", "cpu_cap": 400, "last_modified": "2016-06-17T15:14:19.000Z" }, "f354c285-02ee-4e8a-9518-6237c8378138": { "uuid": "f354c285-02ee-4e8a-9518-6237c8378138", "owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853", "quota": 25, "max_physical_memory": 8192, "zone_state": "running", "state": "running", "brand": "joyent-minimal", "cpu_cap": 400, "last_modified": "2016-06-17T15:14:19.000Z" }, "7d0ddd86-a441-4ffc-a30f-2d05d3866d8d": { "uuid": "7d0ddd86-a441-4ffc-a30f-2d05d3866d8d", "owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853", "quota": 25, "max_physical_memory": 1024, "zone_state": "running", "state": "running", "brand": "joyent-minimal", "cpu_cap": 300, "last_modified": "2016-06-17T15:14:19.000Z" }, "ed869a9c-bd30-42dc-b069-b0ae56acad28": { "uuid": "ed869a9c-bd30-42dc-b069-b0ae56acad28", "owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853", "quota": 25, "max_physical_memory": 1024, "zone_state": "running", "state": "running", "brand": "joyent-minimal", "cpu_cap": 300, "last_modified": "2016-06-17T15:14:19.000Z" }, "5bc500b4-adf7-462d-9906-3006cd34a699": { "uuid": "5bc500b4-adf7-462d-9906-3006cd34a699", "owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853", "quota": 25, "max_physical_memory": 1024, "zone_state": "running", "state": "running", "brand": "joyent-minimal", "cpu_cap": 300, "last_modified": "2016-06-17T15:14:19.000Z" }, "56cac0d5-6f59-40f2-acff-ef861c4bb5a5": { "uuid": "56cac0d5-6f59-40f2-acff-ef861c4bb5a5", "owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853", "quota": 25, "max_physical_memory": 256, "zone_state": "running", "state": "running", "brand": "joyent-minimal", "cpu_cap": 150, "last_modified": "2016-06-17T15:14:19.000Z" }, "38bbf9a3-e6d2-4844-8d59-a03b83b5cbc1": { "uuid": "38bbf9a3-e6d2-4844-8d59-a03b83b5cbc1", "owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853", "quota": 25, "max_physical_memory": 2048, "zone_state": "running", "state": "running", "brand": "joyent-minimal", "cpu_cap": 400, "last_modified": "2016-06-17T15:14:19.000Z" }, "66d1c94a-c32d-4f9b-9fb5-075c705cdd03": { "uuid": "66d1c94a-c32d-4f9b-9fb5-075c705cdd03", "owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853", "quota": 500, "max_physical_memory": 768, "zone_state": "running", "state": "running", "brand": "joyent-minimal", "cpu_cap": 250, "last_modified": "2016-06-17T15:14:19.000Z" }, "ae76109a-80a0-4a15-a1c4-4cd7ec84858b": { "uuid": "ae76109a-80a0-4a15-a1c4-4cd7ec84858b", "owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853", "quota": 25, "max_physical_memory": 1024, "zone_state": "running", "state": "running", "brand": "joyent-minimal", "cpu_cap": 300, "last_modified": "2016-06-17T15:14:19.000Z" }, "fa761d28-1596-4b0c-a9a4-3807a72b25df": { "uuid": "fa761d28-1596-4b0c-a9a4-3807a72b25df", "owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853", "quota": 25, "max_physical_memory": 128, "zone_state": "running", "state": "running", "brand": "joyent-minimal", "cpu_cap": 100, "last_modified": "2016-06-17T15:14:19.000Z" }, "97d39281-e759-41a9-8a1e-41b0b09698f1": { "uuid": "97d39281-e759-41a9-8a1e-41b0b09698f1", "owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853", "quota": 25, "max_physical_memory": 1024, "zone_state": "running", "state": "running", "brand": "joyent-minimal", "cpu_cap": 300, "last_modified": "2016-06-17T15:14:19.000Z" }, "ac4fdf09-1206-4fa8-8d04-0f6ff878744d": { "uuid": "ac4fdf09-1206-4fa8-8d04-0f6ff878744d", "owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853", "quota": 25, "max_physical_memory": 1024, "zone_state": "running", "state": "running", "brand": "joyent-minimal", "cpu_cap": 300, "last_modified": "2016-06-17T15:14:19.000Z" }, "85f47220-e710-4523-bf5e-6af458a47610": { "uuid": "85f47220-e710-4523-bf5e-6af458a47610", "owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853", "quota": 25, "max_physical_memory": 4096, "zone_state": "running", "state": "running", "brand": "joyent-minimal", "cpu_cap": 400, "last_modified": "2016-06-17T15:14:19.000Z" }, "3263c53a-1be1-496b-ad70-95825dfaac57": { "uuid": "3263c53a-1be1-496b-ad70-95825dfaac57", "owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853", "quota": 25, "max_physical_memory": 1024, "zone_state": "running", "state": "running", "brand": "joyent-minimal", "cpu_cap": 300, "last_modified": "2016-06-17T15:14:19.000Z" }, "54452da7-4159-48fc-ad89-4a0e58d0ea33": { "uuid": "54452da7-4159-48fc-ad89-4a0e58d0ea33", "owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853", "quota": 25, "max_physical_memory": 2048, "zone_state": "running", "state": "running", "brand": "joyent-minimal", "cpu_cap": 400, "last_modified": "2016-06-17T15:14:19.000Z" }, "0238ac40-ec99-4c02-83b3-650559000cca": { "uuid": "0238ac40-ec99-4c02-83b3-650559000cca", "owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853", "quota": 25, "max_physical_memory": 1024, "zone_state": "running", "state": "running", "brand": "joyent-minimal", "cpu_cap": 300, "last_modified": "2016-06-17T15:14:19.000Z" }, "da61b9d5-75b1-4f6f-8cd5-973c2f11d8b8": { "uuid": "da61b9d5-75b1-4f6f-8cd5-973c2f11d8b8", "owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853", "quota": 25, "max_physical_memory": 4096, "zone_state": "running", "state": "running", "brand": "joyent-minimal", "cpu_cap": 400, "last_modified": "2016-06-17T15:14:19.000Z" }, "10137a24-5785-6aef-c46b-e5027f956f85": { "uuid": "10137a24-5785-6aef-c46b-e5027f956f85", "owner_uuid": "00000000-0000-0000-0000-000000000000", "quota": 10, "max_physical_memory": 5120, "zone_state": "installed", "state": "stopped", "brand": "kvm", "last_modified": "2016-06-10T08:24:33.000Z" }, "2ad788ff-d3d6-4eb7-b789-c7fc66e26a83": { "uuid": "2ad788ff-d3d6-4eb7-b789-c7fc66e26a83", "owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853", "quota": 10, "max_physical_memory": 4352, "zone_state": "installed", "state": "stopped", "brand": "kvm", "cpu_cap": 400, "last_modified": "2016-06-10T08:23:38.000Z" }, "6356aa55-5881-e1ed-f2bf-a03ba2623cb0": { "uuid": "6356aa55-5881-e1ed-f2bf-a03ba2623cb0", "owner_uuid": "00000000-0000-0000-0000-000000000000", "quota": 10, "max_physical_memory": 5120, "zone_state": "running", "state": "running", "brand": "kvm", "last_modified": "2016-06-17T15:39:55.000Z" } }, "boot_platform": "20160505T114610Z", "boot_params": { "rabbitmq": "guest:guest:rabbitmq.Osney.allocatesoftware.com:5672" }, "kernel_flags": {}, "default_console": "serial", "serial": "ttyb", "created": "2016-05-06T14:28:40.000Z", "sysinfo": { "Live Image": "20160505T114610Z", "System Type": "SunOS", "Boot Time": "1466177784", "Datacenter Name": "Osney", "SDC Version": "7.0", "Manufacturer": "Dell Inc.", "Product": "Precision T1650", "Serial Number": "44X86X1", "SKU Number": "", "HW Version": "01", "HW Family": "", "Setup": "true", "VM Capable": true, "CPU Type": "Intel(R) Xeon(R) CPU E3-1240 V2 @ 3.40GHz", "CPU Virtualization": "vmx", "CPU Physical Cores": 1, "UUID": "44454c4c-3400-1058-8038-b4c04f365831", "Hostname": "headnode", "CPU Total Cores": 8, "MiB of Memory": "32722", "Zpool": "zones", "Zpool Disks": "c1t0d0,c1t1d0", "Zpool Profile": "striped", "Zpool Creation": 1462544920, "Zpool Size in GiB": 1798, "Disks": { "c1t0d0": { "Size in GB": 256 }, "c1t1d0": { "Size in GB": 2000 } }, "Boot Parameters": { "console": "vga", "vga_mode": "115200,8,n,1,-", "headnode": "true" }, "SDC Agents": [ { "name": "firewaller", "version": "1.4.0" }, { "name": "cainstsvc", "version": "0.0.3vrelease-20160428-20160428T180942Z-gc110a12" }, { "name": "hagfish-watcher", "version": "1.1.0-release-20160428-20160428T183307Z-geb1d34a" }, { "name": "cabase", "version": "1.0.3vrelease-20160428-20160428T180942Z-gc110a12" }, { "name": "marlin", "version": "0.0.3" }, { "name": "smartlogin", "version": "0.3.0-release-20160428-20160428T183259Z-g381e99f" }, { "name": "config-agent", "version": "1.5.0" }, { "name": "net-agent", "version": "1.3.0" }, { "name": "agents_core", "version": "2.1.0" }, { "name": "cn-agent", "version": "1.5.2" }, { "name": "amon-relay", "version": "1.0.1" }, { "name": "vm-agent", "version": "1.5.0" }, { "name": "amon-agent", "version": "1.0.1" } ], "Network Interfaces": { "e1000g0": { "MAC Address": "90:b1:1c:7a:cd:0e", "ip4addr": "10.20.4.1", "Link Status": "up", "NIC Names": [ "admin", "external" ] } }, "Virtual Network Interfaces": { "external0": { "MAC Address": "02:08:20:ee:7e:6f", "ip4addr": "10.20.2.1", "Link Status": "up", "Host Interface": "e1000g0", "VLAN": "0" } }, "Link Aggregations": {} }, "ram": 32722, "hostname": "headnode", "status": "running", "headnode": true, "current_platform": "20160505T114610Z", "setup": true, "last_boot": "2016-06-17T15:36:24.000Z", "last_heartbeat": "2016-06-17T16:19:26.066Z", "memory_available_bytes": 18606661632, "memory_arc_bytes": 7620202088, "memory_total_bytes": 34302623744, "memory_provisionable_bytes": -40628440269, "disk_cores_quota_bytes": 2899102924800, "disk_cores_quota_used_bytes": 19053072384, "disk_installed_images_used_bytes": 291095281664, "disk_kvm_quota_bytes": 32212254720, "disk_kvm_quota_used_bytes": 3331371008, "disk_kvm_zvol_used_bytes": 292282540032, "disk_kvm_zvol_volsize_bytes": 283451064320, "disk_pool_alloc_bytes": 503308369920, "disk_pool_size_bytes": 1992864825344, "disk_system_used_bytes": -118947184640, "disk_zone_quota_bytes": 1181116006400, "disk_zone_quota_used_bytes": 16493289472, "transitional_status": "", "score": 0, "overprovision_ratios": { "ram": 1, "disk": 1, "cpu": 4 }, "unreserved_cpu": 0, "unreserved_ram": -38754, "unreserved_disk": 176727 }

On Fri, Jun 17, 2016 at 5:18 PM, Pedro Palazón Candel < notifications@github.com> wrote:

Ok, run: sdcadm experimental update-other in order to get rid of the hostvolume instances - or, at least, to see if anything there is failing.

Then, sdc-cnapi /servers/44454c4c-3400-1058-8038-b4c04f365831 to see what's the status of that server, please

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/joyent/sdc-docker/issues/84#issuecomment-226813796, or mute the thread https://github.com/notifications/unsubscribe/AADRlSiJxpQyEIGhkVOB3xGG4ZIaFeAjks5qMsjKgaJpZM4I4fhS .

kusor commented 8 years ago

Can you get the output of sdcadm health please?

magnayn commented 8 years ago

hmm

[root@headnode (Osney) ~]# sdcadm health

/opt/smartdc/sdcadm/lib/sdcadm.js:720 vm.server_uuid].hostname; ^ TypeError: Cannot read property 'hostname' of undefined at Object.fillOutVmInsts as func at Object._onImmediate (/opt/smartdc/sdcadm/node_modules/vasync/lib/vasync.js:213:20) at processImmediate as _immediateCallback [root@headnode (Osney) ~]#

On Fri, Jun 17, 2016 at 5:22 PM, Pedro Palazón Candel < notifications@github.com> wrote:

Can you get the output of sdcadm health please?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/joyent/sdc-docker/issues/84#issuecomment-226814797, or mute the thread https://github.com/notifications/unsubscribe/AADRlf6lJoxUl1E55M2865Oy_HFrFL2yks5qMsnNgaJpZM4I4fhS .

kusor commented 8 years ago

Try sdc-role list as alternate approach, please.

magnayn commented 8 years ago

from this list, hostvolume-SFO server was destroyed in adminui

[root@headnode (Osney) ~]# sdc-role list ALIAS SERVER UUID RAM STATE ROLE ADMIN_IP adminui0 headnode 54452da7-4159-48fc-ad89-4a0e58d0ea33 2048 running adminui 10.20.4.25 amon0 headnode 7d0ddd86-a441-4ffc-a30f-2d05d3866d8d 1024 running amon 10.20.4.19 amonredis0 headnode 59bc2cae-25c1-4c30-b20f-27719f5673b1 1024 running amonredis 10.20.4.17 assets0 headnode 91d346f7-f7c1-4688-bf3c-0c8235ec6ffd 128 running assets 10.20.4.2 binder0 headnode 0940ef38-3a34-46de-bed7-e5ec9cf443a0 1024 running binder 10.20.4.5 ca0 headnode 85f47220-e710-4523-bf5e-6af458a47610 4096 running ca 10.20.4.24 cloudapi0 headnode 0238ac40-ec99-4c02-83b3-650559000cca 1024 running cloudapi 10.20.4.32 cnapi0 headnode ae76109a-80a0-4a15-a1c4-4cd7ec84858b 1024 running cnapi 10.20.4.16 dhcpd0 headnode fa761d28-1596-4b0c-a9a4-3807a72b25df 128 running dhcpd 10.20.4.3 docker0 headnode da61b9d5-75b1-4f6f-8cd5-973c2f11d8b8 4096 running docker 10.20.4.33 fwapi0 headnode 97d39281-e759-41a9-8a1e-41b0b09698f1 1024 running fwapi 10.20.4.20 hostvolume-JFK JFK 9b67fab4-4830-4bde-91e1-f33e6c2d2946 4096 running hostvolume - hostvolume-LHR LHR 34a1a9e8-4bc0-4915-9b19-8ad1a40fba3c 4096 running hostvolume - hostvolume-SFO - 2873f5ac-6dd1-4750-9e9f-667e3e23d41d 4096 running hostvolume - imgapi0 headnode 66d1c94a-c32d-4f9b-9fb5-075c705cdd03 768 running imgapi 10.20.4.15 mahi0 headnode 3263c53a-1be1-496b-ad70-95825dfaac57 1024 running mahi 10.20.4.27 manatee0 headnode f8054311-3c83-4253-8db8-86d0792c7a86 2048 running manatee 10.20.4.10 moray0 headnode a27c54e5-d36d-40bb-84a6-9eeed610d904 8192 running moray 10.20.4.11 napi0 headnode 56cac0d5-6f59-40f2-acff-ef861c4bb5a5 256 running napi 10.20.4.4 papi0 headnode 5bc500b4-adf7-462d-9906-3006cd34a699 1024 running papi 10.20.4.23 rabbitmq0 headnode 38bbf9a3-e6d2-4844-8d59-a03b83b5cbc1 2048 running rabbitmq 10.20.4.14 redis0 headnode 03687fd1-bdfa-48d7-8054-b84fbc3ce233 1024 running redis 10.20.4.18 sapi0 headnode ae2ad258-1a3c-4e81-a4e8-be92a31debe6 512 running sapi 10.20.4.26 sdc0 headnode ed869a9c-bd30-42dc-b069-b0ae56acad28 1024 running sdc 10.20.4.22 ufds0 headnode 9bb0607c-6829-490a-8f02-81555471cb3b 8192 running ufds 10.20.4.12 vmapi0 headnode ac4fdf09-1206-4fa8-8d04-0f6ff878744d 1024 running vmapi 10.20.4.21 win headnode 2ad788ff-d3d6-4eb7-b789-c7fc66e26a83 4096 stopped - - workflow0 headnode f354c285-02ee-4e8a-9518-6237c8378138 8192 running workflow 10.20.4.13

On Fri, Jun 17, 2016 at 5:25 PM, Pedro Palazón Candel < notifications@github.com> wrote:

Try sdc-role list as alternate approach, please.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/joyent/sdc-docker/issues/84#issuecomment-226815442, or mute the thread https://github.com/notifications/unsubscribe/AADRlaW27AEytqVGpTT8ArM5-E88CcQRks5qMspxgaJpZM4I4fhS .

kusor commented 8 years ago

Mind to give a try to: sdc-vmapi /vms/2873f5ac-6dd1-4750-9e9f-667e3e23d41d -X DELETE and try again?

magnayn commented 8 years ago

​health-check and 'sdcadm update docker'​ are now happy

[root@headnode (Osney) ~]# sdcadm update docker Finding candidate update images for the "docker" service. Using channel dev Up-to-date.

experimental update-other isn't, but I don't know if that is significant

[root@headnode (Osney) ~]# sdcadm experimental update-other Running VMAPI migrations Removing deprecated hostvolume instances sdcadm experimental: error: socket hang up [root@headnode (Osney) ~]#

FWIW [root@headnode (Osney) ~]# sdcadm check-health INSTANCE SERVICE HOSTNAME ALIAS HEALTHY 54452da7-4159-48fc-ad89-4a0e58d0ea33 adminui headnode adminui0 true 7d0ddd86-a441-4ffc-a30f-2d05d3866d8d amon headnode amon0 true 59bc2cae-25c1-4c30-b20f-27719f5673b1 amonredis headnode amonredis0 true 91d346f7-f7c1-4688-bf3c-0c8235ec6ffd assets headnode assets0 true 0940ef38-3a34-46de-bed7-e5ec9cf443a0 binder headnode binder0 true 85f47220-e710-4523-bf5e-6af458a47610 ca headnode ca0 true 0238ac40-ec99-4c02-83b3-650559000cca cloudapi headnode cloudapi0 true ae76109a-80a0-4a15-a1c4-4cd7ec84858b cnapi headnode cnapi0 true fa761d28-1596-4b0c-a9a4-3807a72b25df dhcpd headnode dhcpd0 true da61b9d5-75b1-4f6f-8cd5-973c2f11d8b8 docker headnode docker0 true 97d39281-e759-41a9-8a1e-41b0b09698f1 fwapi headnode fwapi0 true 9b67fab4-4830-4bde-91e1-f33e6c2d2946 hostvolume JFK hostvolume-JFK true 34a1a9e8-4bc0-4915-9b19-8ad1a40fba3c hostvolume LHR hostvolume-LHR true 66d1c94a-c32d-4f9b-9fb5-075c705cdd03 imgapi headnode imgapi0 true 3263c53a-1be1-496b-ad70-95825dfaac57 mahi headnode mahi0 true f8054311-3c83-4253-8db8-86d0792c7a86 manatee headnode manatee0 true a27c54e5-d36d-40bb-84a6-9eeed610d904 moray headnode moray0 true 56cac0d5-6f59-40f2-acff-ef861c4bb5a5 napi headnode napi0 true 5bc500b4-adf7-462d-9906-3006cd34a699 papi headnode papi0 true 38bbf9a3-e6d2-4844-8d59-a03b83b5cbc1 rabbitmq headnode rabbitmq0 true 03687fd1-bdfa-48d7-8054-b84fbc3ce233 redis headnode redis0 true ae2ad258-1a3c-4e81-a4e8-be92a31debe6 sapi headnode sapi0 true ed869a9c-bd30-42dc-b069-b0ae56acad28 sdc headnode sdc0 true 9bb0607c-6829-490a-8f02-81555471cb3b ufds headnode ufds0 true ac4fdf09-1206-4fa8-8d04-0f6ff878744d vmapi headnode vmapi0 true f354c285-02ee-4e8a-9518-6237c8378138 workflow headnode workflow0 true 44454c4c-3400-1058-8038-b4c04f365831 global headnode global true

On Fri, Jun 17, 2016 at 5:29 PM, Pedro Palazón Candel < notifications@github.com> wrote:

Mind to give a try to: sdc-vmapi /vms/2873f5ac-6dd1-4750-9e9f-667e3e23d41d -X DELETE and try again?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/joyent/sdc-docker/issues/84#issuecomment-226816594, or mute the thread https://github.com/notifications/unsubscribe/AADRlWX3FVlN6RrYfIO5iBE5F-CzMYVZks5qMst_gaJpZM4I4fhS .

kusor commented 8 years ago

It looks like something isn't Ok either for VMAPI or SAPI there. Could you take a look at those service logs - within the vms - and see if there's any error?

magnayn commented 8 years ago

I had to leave for the weekend; coming back I found

svc:/manta/application/binder:default (Joyent DNS-ZooKeeper Service) State: maintenance since Sun Jun 19 13:37:59 2016 Reason: Restarting too quickly.

(log was empty) so I've restarted it

On Fri, Jun 17, 2016 at 5:37 PM, Pedro Palazón Candel < notifications@github.com> wrote:

It looks like something isn't Ok either for VMAPI or SAPI there. Could you take a look at those service logs - within the vms - and see if there's any error?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/joyent/sdc-docker/issues/84#issuecomment-226818360, or mute the thread https://github.com/notifications/unsubscribe/AADRlRvFsNMHNLuYsy_1Ob62W1xqtjUDks5qMs05gaJpZM4I4fhS .