Failing to load cloud-config

jmccoy555 commented 6 years ago

Hi,

Just trying this out (once I worked out I needed to remove the -s flag in the create-cluster script) everything appears to deploy nicely, however user-configdrive is failing;


Last login: Sat Aug 18 17:45:27 UTC 2018 from 10.10.1.105 on pts/0
Container Linux by CoreOS stable (1800.7.0)
Failed Units: 1
  user-configdrive.service
gclaybur@queenbee69 ~ $ systemctl status user-configdrive
● user-configdrive.service - Load cloud-config from /media/configdrive
   Loaded: loaded (/usr/lib/systemd/system/user-configdrive.service; static; vendor preset: disabled)
   Active: failed (Result: exit-code) since Sat 2018-08-18 17:47:53 UTC; 3min 21s ago
  Process: 719 ExecStart=/usr/bin/coreos-cloudinit --from-configdrive=/media/configdrive (code=exited, status=1/FAILURE)
 Main PID: 719 (code=exited, status=1/FAILURE)

The config appears to be loaded (I removed a few of the ssh keys to shorten the text);

cat /media/configdrive/openstack/latest/user_data
#cloud-config

hostname: queenbee69

ssh_authorized_keys:
    - ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAIEAy6AWpFloZj8Z12Yx8tJHL2P2HCF9E/JRN0ZXcXKy/WBi8vPRDNcSP3WfXHxpxv6GRBTgiv/NJYkTcOHgoL5kIcA8foEaCQ5KjVd9kK/JXE3t9LwfLbBGMKjpbreCL8Y2bQGoxSsvg/sFr+OLAfiqMztEa/vBG8ucf6AEl4sLxLU= gclaybur@GaryClayburgXP

coreos:
  etcd2:
    advertise-client-urls: "http://10.10.1.140:2379"
    initial-advertise-peer-urls: "http://10.10.1.140:2380"
    listen-client-urls: "http://0.0.0.0:2379,http://0.0.0.0:4001"
    listen-peer-urls: "http://10.10.1.140:2380,http://10.10.1.140:7001"
    name: "queenbee69"
    initial-cluster: "queenbee69=http://10.10.1.140:2380"
  update:
    reboot-strategy: "reboot"
  units:
    - name: etcd2.service
      command: start
    - name: 00-ens.network
      runtime: true
      content: |
        [Match]
        Name=ens192

        [Network]
        Address=10.10.1.140/24
        Gateway=10.10.1.100
        DNS=10.10.1.101
        Domains=mcy.lan
    - name: settimezone.service
      command: start
      content: |
        [Unit]
        Description=Set the timezone

        [Service]
        ExecStart=/usr/bin/timedatectl set-timezone America/Denver
        RemainAfterExit=yes
        Type=oneshot

write_files:
  - path: /etc/systemd/system/docker.service.d/50-insecure-registry.conf
    content: |
        #allow docker to use private registry over http using the dns registered name, "registry"
        [Service]
        Environment=DOCKER_OPTS='--insecure-registry="registry:5000"'
users:
  - name: gclaybur
    ssh-authorized-keys:
    - ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAIEAy6AWpFloZj8Z12Yx8tJHL2P2HCF9E/JRN0ZXcXKy/WBi8vPRDNcSP3WfXHxpxv6GRBTgiv/NJYkTcOHgoL5kIcA8foEaCQ5KjVd9kK/JXE3t9LwfLbBGMKjpbreCL8Y2bQGoxSsvg/sFr+OLAfiqMztEa/vBG8ucf6AEl4sLxLU= gclaybur@GaryClayburgXP
    passwd: $1$jmB5gDWK$PuhW2bvjEwIlW5Wx.2G4A.
    primary-group: wheel
    groups:
      - sudo
      - docker
      - systemd-journal

Net interface checked;

ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
        ether 02:42:3d:14:51:3d  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens192: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.10.1.140  netmask 255.255.255.0  broadcast 10.10.1.255
        inet6 fe80::20c:29ff:fe1b:8b8b  prefixlen 64  scopeid 0x20<link>
        ether 00:0c:29:1b:8b:8b  txqueuelen 1000  (Ethernet)
        RX packets 1403  bytes 109730 (107.1 KiB)
        RX errors 0  dropped 35  overruns 0  frame 0
        TX packets 310  bytes 44954 (43.9 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 4  bytes 200 (200.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 4  bytes 200 (200.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

EDIT:

Failed to apply cloud-config: Unit etcd2.service not found.

sudo systemctl restart user-configdrive
Job for user-configdrive.service failed because the control process exited with error code.
See "systemctl status user-configdrive.service" and "journalctl -xe" for details.
gclaybur@queenbee69 ~ $ sudo journalctl -xe
Aug 18 18:11:55 queenbee69 /usr/lib64/systemd/system-generators/torcx-generator[837]: time="2018-08-18T18:11:55Z" level=info msg="torcx already run"
Aug 18 18:11:55 queenbee69 coreos-cloudinit[809]: 2018/08/18 18:11:55 Restarting systemd-networkd
Aug 18 18:11:55 queenbee69 systemd[1]: Starting Garbage Collection for rkt...
-- Subject: Unit rkt-gc.service has begun start-up
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit rkt-gc.service has begun starting up.
Aug 18 18:11:55 queenbee69 systemd[1]: Stopping Network Service...
-- Subject: Unit systemd-networkd.service has begun shutting down
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit systemd-networkd.service has begun shutting down.
Aug 18 18:11:55 queenbee69 systemd-timesyncd[628]: Network configuration changed, trying to establish connection.
Aug 18 18:11:55 queenbee69 systemd[1]: Stopped Network Service.
-- Subject: Unit systemd-networkd.service has finished shutting down
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit systemd-networkd.service has finished shutting down.
Aug 18 18:11:55 queenbee69 systemd[1]: Starting Network Service...
-- Subject: Unit systemd-networkd.service has begun start-up
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit systemd-networkd.service has begun starting up.
Aug 18 18:11:55 queenbee69 systemd-timesyncd[628]: Synchronized to time server 194.80.204.184:123 (0.coreos.pool.ntp.org).
Aug 18 18:11:55 queenbee69 systemd[1]: Started Garbage Collection for rkt.
-- Subject: Unit rkt-gc.service has finished start-up
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit rkt-gc.service has finished starting up.
--
-- The start-up result is RESULT.
Aug 18 18:11:55 queenbee69 systemd-networkd[845]: ens192: Gained IPv6LL
Aug 18 18:11:55 queenbee69 systemd-timesyncd[628]: Network configuration changed, trying to establish connection.
Aug 18 18:11:55 queenbee69 systemd-networkd[845]: Enumeration completed
Aug 18 18:11:55 queenbee69 systemd[1]: Started Network Service.
-- Subject: Unit systemd-networkd.service has finished start-up
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit systemd-networkd.service has finished starting up.
--
-- The start-up result is RESULT.
Aug 18 18:11:55 queenbee69 coreos-cloudinit[809]: 2018/08/18 18:11:55 Restarted systemd-networkd (done)
Aug 18 18:11:55 queenbee69 coreos-cloudinit[809]: 2018/08/18 18:11:55 Calling unit command "start" on "etcd2.service"
Aug 18 18:11:55 queenbee69 coreos-cloudinit[809]: 2018/08/18 18:11:55 Failed to apply cloud-config: Unit etcd2.service not found.
Aug 18 18:11:55 queenbee69 systemd[1]: user-configdrive.service: Main process exited, code=exited, status=1/FAILURE
Aug 18 18:11:55 queenbee69 sudo[806]: pam_unix(sudo:session): session closed for user root
Aug 18 18:11:55 queenbee69 systemd[1]: user-configdrive.service: Failed with result 'exit-code'.
Aug 18 18:11:55 queenbee69 systemd[1]: Failed to start Load cloud-config from /media/configdrive.
-- Subject: Unit user-configdrive.service has failed
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit user-configdrive.service has failed.
--
-- The result is RESULT.
Aug 18 18:11:55 queenbee69 systemd-networkd[845]: lo: Link is not managed by us
Aug 18 18:11:55 queenbee69 systemd-networkd[845]: lo: Configured
Aug 18 18:11:58 queenbee69 update_engine[666]: I0818 18:11:58.221846   666 update_attempter.cc:493] Updating boot flags...
Aug 18 18:11:58 queenbee69 sudo[865]: gclaybur : TTY=pts/0 ; PWD=/home/gclaybur ; USER=root ; COMMAND=/bin/journalctl -xe
Aug 18 18:11:58 queenbee69 sudo[865]: pam_unix(sudo:session): session opened for user root by gclaybur(uid=0)
Aug 18 18:11:58 queenbee69 sudo[865]: pam_systemd(sudo:session): Cannot create session: Already running in a session
lines 2297-2363/2363 (END)

Running on a standalone esxi 6.7

Any ideas or pointers would be appreciated as this really looks like a cool way to deploy.

Thanks.

jmccoy555 commented 6 years ago

Progress.. updating user-data to;

coreos: etcd: advertise-client-urls: "http://${ETCD_IP_ADDRESS}:2379" initial-advertise-peer-urls: "http://${ETCD_IP_ADDRESS}:2380" listen-client-urls: "http://0.0.0.0:2379,http://0.0.0.0:4001" listen-peer-urls: "http://${ETCD_IP_ADDRESS}:2380,http://${ETCD_IP_ADDRESS}:7001" name: "${ETCD_HOSTNAME}" initial-cluster: "${ETCD_HOSTNAME}=http://${ETCD_IP_ADDRESS}:2380" update: reboot-strategy: "reboot" units:

name: etcd-member.service command: start ................

Allows the service to start.

Now I'm just getting;

etcdctl member list Error: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 10.10.1.140:4001: connect: connection refused

from the worker.

gclayburg commented 6 years ago

Well, if I remember right, the last time I looked at this CoreOS had changed the way they configure tools like etcd on startup now. These scripts haven't really been changed to use that new format. I'm sure it could be done, but I no longer use etcd this way so I haven't bothered to do it myself.

If you come up with something that works it would sure be interesting to see.

jmccoy555 commented 6 years ago

It looks like you are quite right - clustering.

Also, fleet is no longer developed or included with coreOS, so theres a second problem, but it doesn't look like it would be too difficult if using static IPs.

I think I'm going to deploy with the script but then use Docker Swarm (for now anyway).

gclayburg commented 6 years ago

Right, fleet is basically dead - and rightfully so with the likes of kubernetes and swarm using a better, more supported model. But I still do use the script to deploy new standalone CoreOS servers

gclayburg / coreos-vmware-deploy

Failing to load cloud-config #2