RedHatOfficial / ocp4-vsphere-upi-automation

Automates most of the manual steps of deploying OCP4.x cluster on vSphere
MIT License
131 stars 107 forks source link

Failed to transfert rhcos-vmware template to the vsphere #7

Closed Gurianos closed 4 years ago

Gurianos commented 4 years ago

Vsphere 6.5

ocp4 repository is well created on vSphere ; Failed to transfert rhcos-vmware template to the vsphere , Could you help me and explain where im doing wrong ?

fatal: [localhost]: FAILED! => {"changed": false, "dest": "/root/ocp4-vsphere-upi-automation/downloads/rhcos-vmware.ova", "elapsed": 10, "gid": 0, "group": "root", "mode": "0644", "msg": "Request failed: ", "owner": "root", "secontext": "system_u:object_r:admin_home_t:s0", "size": 831590400, "state": "file", "uid": 0, "url": "https://mirror.openshift.com/pub/openshift-v4/x86_64/dependencies/rhcos/latest/latest/rhcos-4.3.8-x86_64-vmware.x86_64.ova"}

thank you

Gurianos commented 4 years ago

i re-start the playbook with -e clean=true and got

TASK [dhcp_ova : Deploy the OVF template into the folder] **** fatal: [localhost]: FAILED! => {"changed": false, "msg": "Failure validating OVF import spec: There are no active hosts in the cluster."}

For information, i'm not using DRS on my physical host.

thank you

vchintal commented 4 years ago

Put /root/ocp4-vsphere-upi-automation/bin on PATH and export the GOVC variables, like shown below. And then, run the command, govc find. Let me know if you see a host listed under /$(dc)/host. If not that is a problem, it means you did not add a ESXi host in vCenter.

export GOVC_USERNAME=admin-username export GOVC_PASSWORD=password export GOVC_URL=https://vcenter-ip export GOVC_INSECURE=1

Gurianos commented 4 years ago

i did it. "govc" find me well all vSpheres parameters and list the hosts inside. For info, When i create manually the template in ocp4, i'v got this error "Failed to create a virtual machine : A specified parameter was not correct: spec.pool" Maybe an Ansible vmware lib reason.. Another idea ?

Gurianos commented 4 years ago

I solved it by: Removing the old python-pip check Adding in roles/dhcp_ova/tasks/main.yml on the deploy and create parts cluster: "{{ vcenter.cluster }}" resource_pool: "{{ vcenter.resourcepool }}" and the need in the all.yaml

I also change the line ova to ovf: "{{ playbook_dir }}/downloads/{{vcenter.templateName}}.ova", i dont know if that do something but now it works.

Cheers and thnx for this

vchintal commented 4 years ago

When you got some time, please back up the files you changed, revert all the changes except for the roles/dhcp_ova/tasks/main.yml, blow off all the VMs in the folder and rerun the playbooks. I think that is the one that helped you. The reason I ask is that its better to know what exactly worked of all the various changes.

vchintal commented 4 years ago

Closing the ticket for now. Please update with any additional testing.

Gurianos commented 4 years ago

All checks was done as user root, i'll maybe check it with a standard user.

1) before launching the build of the helper node, i changed its config to generate the sshkey to .ssh/ocp4 & ocp4.pub.

2) Changed ./staging as doc expect with replacing the 192.168.86.180 by localhost in the webserver section

3) Changeds ansible.cfg because using root as user with webserver on localhost as the doc explain.

4) Remove "- python-pip" from roles/common/tasks/main.yml

4a) ln -s /usr/bin/pip3 /usr/bin/pip ## if you want pip to go to pip3 you can also link it to /usr/bin/pip2 who is more conventional since pip3 did not add /usr/bin/pip anymore.

3) Adding in the /roles/dhcp_ova/tasks/main.yml for each vcenter "bootstrap, master and worker" sections cluster: "{{ vcenter.cluster }}" resource_pool: Resources

4) Adding in the vcenter section group_vars/all.yml cluster: Name_of_your_target_cluster

Gurianos commented 4 years ago

Are someone deploy it fully with no errors, without these modifications ?

vchintal commented 4 years ago

Yeah my colleagues could manage without the need for a change in roles/dhcp_ova/tasks/main.yml. What is the Ansible version that you are using, just curious ?

Gurianos commented 4 years ago

You are on the right way :) im with ansible 2.9.5. This bug which is not one is known from the vmware_deploy ansible module,

Gurianos commented 4 years ago

and python 3.6.8 about the python-pip error

vchintal commented 4 years ago

How about the ESXi and vCenter versions? I am working off of vSphere Client version 6.7.0.40000 and ESXi version: 6.7.0, ESXi build number: 14320388

Gurianos commented 4 years ago

Thanks, nice, you maybe right it can be also a vsphère version problem. What your Ansible version is ? have you plan to test with 2.9.5 ? My vSphere version is the last 6.5 as indicate in the first post, lots of my customers are in 6.5. Addinng a simple comment on the doc about 6.5 verrsions or adding a new branche is up to you. I'll continu my discover of Openshift from now :) Thank you a lot for the job, that help a lot to deploy Openshift quickly.

Neverless the feature about chosing a cluster and a resource pool as target will be nice too

vchintal commented 4 years ago

Thank you so much for your input. It seems more related to the vSphere version than Ansible. I will definitely make a note for folks using the vSphere 6.5 about the cluster and resource pool settings in the README.