coreos / fedora-coreos-tracker

Issue tracker for Fedora CoreOS
https://fedoraproject.org/coreos/
262 stars 59 forks source link

no cloud agents: vmware #70

Open dustymabe opened 5 years ago

dustymabe commented 5 years ago

In #12 we decided that we'd like to try to not ship cloud agents. This ticket will document investigation and strategy for shipping without a cloud agent on the vmware virtualization cloud platform.

See also #41 for a discussion of how to ship cloud specific bits using ignition.

dustymabe commented 5 years ago

For this one it is mostly unfamiliar territory and we'd need a set up environment to experiment and determine what is needed/not needed. We might be able to use packet's ESX servers for this. If that doesn't work then we'd need to access a set up environment.

bgilbert commented 5 years ago

In Container Linux:

It appears that the primary benefits of vmtoolsd are:

I think we'll need to address the first point at least, since it seems likely to surprise users if VMware power controls default to hard shutdown/reboot on Fedora CoreOS but not on other distros.

On Fedora CoreOS:

redbaron commented 5 years ago

We are users of CoreOS on VMware in a quite strict environment and I can say that without vmtoolsd we wouldn't get an exception from infra team to run our own image.

bgilbert commented 5 years ago

@redbaron Which vmtoolsd functionality do you depend on? As noted, we'd probably disable unnecessary modules.

dcode commented 5 years ago

One of the vmtoolsd features not listed that I depend on is reporting the IP addresses associated with a given vm. This let's me use the vsphere and/or esx APIs to get associated IPs for follow-on scripted actions. I specifically do this for CI/CD of system deployment scripts.

cgwalters commented 5 years ago

As of 4.1 RHCOS ships open-vm-tools by default; we're not entirely happy about this but it's where we are.

kpettijohn commented 5 years ago

Previously when running CoreOS on VMware it was very nice to have vmtoolsd expose the IP address of the VM, which could then be used as a Terraform output or passed to other resources.

Basic Terraform example

resource "vsphere_virtual_machine" "linux" {
  name             = "${var.vm_name}"
  resource_pool_id = "${data.vsphere_resource_pool.pool.id}"
  datastore_id     = "${data.vsphere_datastore.datastore.id}"
  folder           = "${var.vm_folder}"

  num_cpus = "${var.vcpu}"
  memory   = "${var.memory}"

  guest_id = "${data.vsphere_virtual_machine.template.guest_id}"

  network_interface {
    network_id = "${data.vsphere_network.network.id}"
  }

  clone {
    template_uuid = "${data.vsphere_virtual_machine.template.id}"
  }

  scsi_type = "lsilogic"

  disk {
    label            = "disk00"
    size             = "${data.vsphere_virtual_machine.template.disks.0.size}"
    thin_provisioned = true
  }

  extra_config {
    guestinfo.hostname                      = "${var.hostname}"
    guestinfo.ignition.config.data.encoding = "base64"
    guestinfo.ignition.config.data          = "${base64encode(file("./ignition.json"))}"
  }
}

output "ip" {
  value = "${vsphere_virtual_machine.linux.guest_ip_addresses}"
}

output "vmware_tools_status" {
  value = "${vsphere_virtual_machine.linux.vmware_tools_status}"
}
dcode commented 5 years ago

@kpettijohn that and ansible dynamic inventory is what I'm looking for specifically.

lucab commented 5 years ago

Network introspection belongs to the "collection of guest metrics on behalf of the host" bucket in the list above.

While I understand its usefulness, it isn't (IMHO) a very pressing requirement for the following reasons:

kpettijohn commented 5 years ago

Thanks for the feedback @lucab. After looking into things further I think I should be able to get by using the toolbox provided by vmware/govmomi for my use case.

Here is a basic usage example that when built and run on a VM hosted by ESX, will register itself as a Guest Managed instance of VMware tools and report the default IP address.

fedora-coreos-vmtools
Reamer commented 5 years ago

It would be nice, if FCOS would also ship with open-vm-tools. open-vm-tools are needed for K8s vsphere storage provider.

kai-uwe-rommel commented 4 years ago

Also, open-vm-tools would allow a graceful shutdown or reboot of a FCOS VM from the vSphere client or CLI/API.

alexlllll commented 4 years ago

I just tried the ova deployment and I dont see any open-vm-tools. So what happened?

LorbusChris commented 4 years ago

Including it in FCOS by default is not really what we want, because, and I'm quoting from the first comment:

In #12 we decided that we'd like to try to not ship cloud agents.

You should be able to install it with:

rpm-ostree install open-vm-tools
kai-uwe-rommel commented 4 years ago

@LorbusChris Great, thanks for the information ... BTW, is there documentation about this?

Is there a way to get this "rpm-ostree install open-vm-tools" executed automatically during installation? I did not see something in the description of the ignition files that would allow this? Please correct me if I'm wrong. Thanks!

zotrix commented 4 years ago

This hack more useful instead of "rpm-ostree install open-vm-tools", no reboot needed and possible provision

systemd:
  units:
    - name: open-vm-tools.service
      enabled: true
      contents: |
        [Unit]
        Description=Open VM Tools
        After=network-online.target
        Wants=network-online.target

        [Service]
        TimeoutStartSec=0
        ExecStartPre=-/bin/podman kill open-vm-tools
        ExecStartPre=-/bin/podman rm open-vm-tools
        ExecStartPre=/bin/podman pull open-vm-tools:fc31
        ExecStart=/bin/podman run -e SYSTEMD_IGNORE_CHROOT=1 -v  /proc/:/hostproc/ -v /sys/fs/cgroup:/sys/fs/cgroup -v /run/systemd:/run/systemd --pid=host --net=host --ipc=host --uts=host --rm  --privileged --name open-vm-tools open-vm-tools:fc31

        [Install]
        WantedBy=multi-user.target
varesa commented 4 years ago

@zotrix Is that container image available somewhere or was that just a proposal?

Bonehead5338 commented 4 years ago

I have been using terraforms 'wait for ip' functionality which relies on open-vm-tools when provisioning RHCOS which doesn't work now. Would be nice if it did

zotrix commented 4 years ago

@zotrix Is that container image available somewhere or was that just a proposal?

@varesa in private registry, but Dockerfile like in this repo https://github.com/projectatomic/atomic-system-containers/tree/master/open-vm-tools-centos

kai-uwe-rommel commented 4 years ago

I finally came up with this:

systemd:
  units:
    - name: postinstall.service
      enabled: true
      contents: |
        [Unit]
        Description=Post Installation
        After=network-online.target
        Wants=network-online.target

        [Service]
        TimeoutStartSec=0
        ExecStart=/bin/bash -c "/bin/rpm-ostree install open-vm-tools nrpe && reboot || /bin/true"

        [Install]
        WantedBy=multi-user.target

This also easily allows to install more packages in one step.

straffalli commented 4 years ago

Hi,

We were using CoreOS as underlaying OS for Kubernetes clusters, we try now to move to FCOS, and we encounter this issue with open-vm-tools, workaround requires adding a unit just for installation and a reboot step, that is not very handy ...

Like already said, this is required for VM graceful shutdown, vSphere storage provider for K8S, reporting guest metrics, etc ...

In #12 we decided that we'd like to try to not ship cloud agents.

If this is a strict decision, Is it possible to re-apply an Ignition file after first boot on FCOS (like for CoreOS) in order to use a tool like Packer to add open-vm-tools extra package?

Thanks

varesa commented 4 years ago

@straffalli why not run in a container (a dockerfile linked above)? Works fine for us. Just add an unit for that, no need to reboot or run Ignition twice

remoe commented 4 years ago

@varesa does shutdown work on your setup?

govc vm.power -s -force <name>

varesa commented 4 years ago

@remoe

It does work.

esa@desktop $ govc vm.power -k -s k8s-test-master-1
Shutdown guest VirtualMachine:vm-38027... OK

VM starts shutting down and after a minute or so stops

Amos-85 commented 3 years ago

After installing open-vm-tools with rpm-ostree and rebooting the machine, I'm getting those errors in /var/log/vmware-vmtoolsd-root.log :

[2020-11-01T17:49:53.450Z] [ message] [vmsvc] Log caching is enabled with maxCacheEntries=4096.
[2020-11-01T17:49:53.450Z] [ message] [vmsvc] Core dump limit set to -1
[2020-11-01T17:49:53.450Z] [ message] [vmtoolsd] Tools Version: 11.1.5.22735 (build-16724464)
[2020-11-01T17:49:53.699Z] [ message] [vmsvc] Cannot load message catalog for domain 'hgfsServer', language 'C', catalog dir '/usr/share/open-vm-tools'.
[2020-11-01T17:49:53.699Z] [ message] [vmtoolsd] Plugin 'hgfsServer' initialized.
[2020-11-01T17:49:53.700Z] [ message] [vix] QueryVGAuthConfig: vgauth usage is: 1
[2020-11-01T17:49:53.700Z] [ message] [vmsvc] Cannot load message catalog for domain 'vix', language 'C', catalog dir '/usr/share/open-vm-tools'.
[2020-11-01T17:49:53.700Z] [ message] [vmtoolsd] Plugin 'vix' initialized.
[2020-11-01T17:49:53.700Z] [ message] [vmsvc] Cannot load message catalog for domain 'appInfo', language 'C', catalog dir '/usr/share/open-vm-tools'.
[2020-11-01T17:49:53.700Z] [ message] [vmtoolsd] Plugin 'appInfo' initialized.
[2020-11-01T17:49:53.700Z] [ message] [vmsvc] Cannot load message catalog for domain 'deployPkg', language 'C', catalog dir '/usr/share/open-vm-tools'.
[2020-11-01T17:49:53.700Z] [ message] [vmtoolsd] Plugin 'deployPkg' initialized.
[2020-11-01T17:49:53.700Z] [ message] [vmsvc] Cannot load message catalog for domain 'guestInfo', language 'C', catalog dir '/usr/share/open-vm-tools'.
[2020-11-01T17:49:53.700Z] [ message] [vmtoolsd] Plugin 'guestInfo' initialized.
[2020-11-01T17:49:53.700Z] [ message] [vmsvc] Cannot load message catalog for domain 'powerops', language 'C', catalog dir '/usr/share/open-vm-tools'.
[2020-11-01T17:49:53.700Z] [ message] [vmtoolsd] Plugin 'powerops' initialized.
[2020-11-01T17:49:53.700Z] [ message] [vmsvc] Cannot load message catalog for domain 'timeSync', language 'C', catalog dir '/usr/share/open-vm-tools'.
[2020-11-01T17:49:53.700Z] [ message] [vmtoolsd] Plugin 'timeSync' initialized.
[2020-11-01T17:49:53.700Z] [ message] [vmsvc] Cannot load message catalog for domain 'vmbackup', language 'C', catalog dir '/usr/share/open-vm-tools'.
[2020-11-01T17:49:53.700Z] [ message] [vmtoolsd] Plugin 'vmbackup' initialized.
[2020-11-01T17:49:53.704Z] [ message] [vix] VixTools_ProcessVixCommand: command 62

The Esxi version is 6.7

is someone face it before?

remoe commented 3 years ago

@Amos-85 , do you have tried https://github.com/coreos/fedora-coreos-tracker/issues/503#issuecomment-637716268 ?

Amos-85 commented 3 years ago

@remoe not yet, Is it should function within a container or am I misconfigure something in the vm template?

Amos-85 commented 3 years ago

@remoe I've run open-vm-tools in a container like the solution you mentioned but I see the same output inside the container in the log /var/log/vmware-vmtoolsd-root.log

it's very odd issue.

remoe commented 3 years ago

@Amos-85 It works with "fedora-coreos-32.20200824.3.0" on ESXi 6.7. I don't have this issue.

Amos-85 commented 3 years ago

I'm not sure it's relate to the issue, what guest OS version have you choose in the vm template?

remoe commented 3 years ago

This is selected from the official fcos ova template.

Amos-85 commented 3 years ago

Right now I've succeeded to run open-vm-tools with the container solution @remoe mentioned but now I'm getting other exception relate to perl package in open-vm-tools in /var/log/vmware-imc/toolsDeployPkg.log

[root@localhost log]# cat vmware-imc/toolsDeployPkg.log 
[2020-11-03T13:34:05.591Z] [   debug] ## Starting deploy pkg operation
[2020-11-03T13:34:05.591Z] [   debug] Deploying /var/run/201b5b4e/imcf-j3KP2b
[2020-11-03T13:34:05.591Z] [    info] Initializing deployment module.

[2020-11-03T13:34:05.591Z] [    info] Cleaning old state files.

[2020-11-03T13:34:05.591Z] [    info] EXIT STATE 'INPROGRESS'.

[2020-11-03T13:34:05.591Z] [   debug] Setting deploy error: 'Error removing lock '/var/log/.vmware-deploy.INPROGRESS'.(No such file or directory)'.

[2020-11-03T13:34:05.591Z] [    info] EXIT STATE 'Done'.

[2020-11-03T13:34:05.591Z] [   debug] Setting deploy error: 'Error removing lock '/var/log/.vmware-deploy.Done'.(No such file or directory)'.

[2020-11-03T13:34:05.591Z] [    info] EXIT STATE 'ERRORED'.

[2020-11-03T13:34:05.591Z] [   debug] Setting deploy error: 'Error removing lock '/var/log/.vmware-deploy.ERRORED'.(No such file or directory)'.

[2020-11-03T13:34:05.591Z] [   debug] Setting deploy error: 'Success.'.

[2020-11-03T13:34:05.591Z] [    info] Deploying cabinet file '/var/run/201b5b4e/imcf-j3KP2b'.

[2020-11-03T13:34:05.591Z] [    info] Transitioning from state '(null)' to state 'INPROGRESS'.

[2020-11-03T13:34:05.591Z] [    info] ENTER STATE 'INPROGRESS'.

[2020-11-03T13:34:05.592Z] [    info] Reading cabinet file '/var/run/201b5b4e/imcf-j3KP2b' and will extract it to '/var/run/.vmware-imgcust-dD4kPZE'.

[2020-11-03T13:34:05.592Z] [    info] Flags in the header: 0.

[2020-11-03T13:34:05.592Z] [    info] Original deployment command: '/bin/sh /tmp/.vmware/linux/deploy/scripts/customize.sh /tmp/.vmware/linux/deploy/cust.cfg'.

[2020-11-03T13:34:05.592Z] [    info] Actual deployment command: '/bin/sh /var/run/.vmware-imgcust-dD4kPZE/scripts/customize.sh /var/run/.vmware-imgcust-dD4kPZE/cust.cfg'.

[2020-11-03T13:34:05.592Z] [    info] Extracting package files.

[2020-11-03T13:34:05.610Z] [   debug] Check if cust.cfg exists.

[2020-11-03T13:34:05.610Z] [    info] cust.cfg is found in '/var/run/.vmware-imgcust-dD4kPZE' directory.

[2020-11-03T13:34:05.610Z] [   debug] Command to exec : '/usr/bin/cloud-init'.

[2020-11-03T13:34:05.610Z] [    info] sizeof ProcessInternal is 56

[2020-11-03T13:34:05.610Z] [    info] Returning, pending output from stdout
[2020-11-03T13:34:05.610Z] [    info] Returning, pending output from stderr
[2020-11-03T13:34:05.633Z] [    info] Process exited normally after 0 seconds, returned 127
[2020-11-03T13:34:05.633Z] [    info] No more output from stdout
[2020-11-03T13:34:05.633Z] [    info] No more output from stderr
[2020-11-03T13:34:05.633Z] [    info] Customization command output: ''.

[2020-11-03T13:34:05.633Z] [   error] Customization command failed with exitcode: 127, stderr: ''.

[2020-11-03T13:34:05.633Z] [    info] cloud-init is not installed.

[2020-11-03T13:34:05.633Z] [    info] Executing traditional GOSC workflow.

[2020-11-03T13:34:05.633Z] [   debug] Command to exec : '/bin/sh'.

[2020-11-03T13:34:05.633Z] [    info] sizeof ProcessInternal is 56

[2020-11-03T13:34:05.633Z] [    info] Returning, pending output from stdout
[2020-11-03T13:34:05.633Z] [    info] Returning, pending output from stderr
[2020-11-03T13:34:05.645Z] [    info] Process exited normally after 0 seconds, returned 1
[2020-11-03T13:34:05.645Z] [    info] Saving output from stdout
[2020-11-03T13:34:05.645Z] [    info] No more output from stdout
[2020-11-03T13:34:05.645Z] [    info] No more output from stderr
[2020-11-03T13:34:05.645Z] [    info] Customization command output: 'GOSC_DIR: /var/run/.vmware-imgcust-dD4kPZE/scripts
OS_KERNEL: Linux
ERROR: Guest Customization is not supported on systems not having Perl installed.
'.

[2020-11-03T13:34:05.645Z] [   error] Customization command failed with exitcode: 1, stderr: ''.

[2020-11-03T13:34:05.645Z] [   error] Customization process returned with error.

[2020-11-03T13:34:05.645Z] [   debug] Deployment result = 1.

[2020-11-03T13:34:05.645Z] [    info] Setting 'unknown' error status in vmx.

[2020-11-03T13:34:05.646Z] [    info] Transitioning from state 'INPROGRESS' to state 'ERRORED'.

[2020-11-03T13:34:05.646Z] [    info] ENTER STATE 'ERRORED'.

[2020-11-03T13:34:05.646Z] [    info] EXIT STATE 'INPROGRESS'.

[2020-11-03T13:34:05.646Z] [   debug] Setting deploy error: 'Deployment failed.The forked off process returned error code.'.

[2020-11-03T13:34:05.646Z] [   error] Deployment failed.The forked off process returned error code.

[2020-11-03T13:34:05.646Z] [    info] Launching cleanup.

[2020-11-03T13:34:05.646Z] [   debug] Command to exec : '/bin/rm'.

[2020-11-03T13:34:05.646Z] [    info] sizeof ProcessInternal is 56

[2020-11-03T13:34:05.647Z] [    info] Returning, pending output from stdout
[2020-11-03T13:34:05.647Z] [    info] Returning, pending output from stderr
[2020-11-03T13:34:05.674Z] [    info] Process exited normally after 0 seconds, returned 0
[2020-11-03T13:34:05.674Z] [    info] No more output from stdout
[2020-11-03T13:34:05.674Z] [    info] No more output from stderr
[2020-11-03T13:34:05.674Z] [    info] Customization command output: ''.

[2020-11-03T13:34:05.674Z] [    info] sSkipReboot: 'false', forceSkipReboot 'false'.

[2020-11-03T13:34:05.674Z] [   error] Deploy error: 'Deployment failed.The forked off process returned error code.'.

[2020-11-03T13:34:05.674Z] [   error] Package deploy failed in DeployPkg_DeployPackageFromFile
[2020-11-03T13:34:05.674Z] [   debug] ## Closing log

only after installing perl everything work as expected.

nccurry commented 2 years ago

@zotrix Is that container image available somewhere or was that just a proposal?

@varesa in private registry, but Dockerfile like in this repo https://github.com/projectatomic/atomic-system-containers/tree/master/open-vm-tools-centos

You can also use the official Red Hat image for this registry.access.redhat.com/rhel7/open-vm-tools:latest

https://catalog.redhat.com/software/containers/rhel7/open-vm-tools/58ee4f6e4b339a32b5fb7bae?container-tabs=overview&gti-tabs=unauthenticated

My terraform code for this looks like the following.

data "ignition_systemd_unit" "open_vm_tools" {
  name = "open-vm-tools.service"
  enabled = true
  content = <<-EOT
    [Unit]
    Description=Open VM Tools
    After=network-online.target
    Wants=network-online.target

    [Service]
    TimeoutStartSec=0
    ExecStartPre=-/bin/podman stop open-vm-tools --ignore
    ExecStartPre=-/bin/podman rm open-vm-tools --ignore
    ExecStartPre=/bin/podman pull registry.access.redhat.com/rhel7/open-vm-tools:latest
    ExecStart=/bin/podman run \
      --privileged \
      --rm \
      -v /proc/:/hostproc/ \
      -v /sys/fs/cgroup:/sys/fs/cgroup \
      -v /var/log:/var/log \
      -v /run/systemd:/run/systemd \
      -v /sysroot:/sysroot \
      -v /etc/passwd:/etc/passwd \
      -v /etc/shadow:/etc/shadow \
      -v /etc/adjtime:/etc/adjtime \
      -v /var/lib/sss/pipes/:/var/lib/sss/pipes/:rw \
      -v /tmp:/tmp:rw \
      -v /etc/sysconfig:/etc/sysconfig:rw \
      -v /etc/resolv.conf:/etc/resolv.conf:rw \
      -v /etc/nsswitch.conf:/etc/nsswitch.conf:rw \
      -v /etc/hosts:/etc/hosts:rw \
      --net=host \
      --pid=host \
      --ipc=host \
      --uts=host \
      --name open-vm-tools \
      registry.access.redhat.com/rhel7/open-vm-tools:latest
    ExecStop=-/usr/bin/podman stop open-vm-tools
    ExecStopPost=-/usr/bin/podman rm open-vm-tools

    [Install]
    WantedBy=multi-user.target
    EOT
}

You can also create a container image yourself pretty easily via the following Containerfile:

FROM registry.fedoraproject.org/fedora-minimal:latesst

ENV SYSTEMD_IGNORE_CHROOT=1

RUN microdnf install -y --nodocs open-vm-tools

CMD ["/usr/bin/vmtoolsd"]
arnegroskurth commented 2 years ago

Built this container image for guest RPC support: https://hub.docker.com/r/arnegroskurth/open-vm-tools