NVIDIA / cloud-native-stack

Run cloud native workloads on NVIDIA GPUs
Apache License 2.0
118 stars 47 forks source link

Remote Ansible Installs Fail #57

Closed BHSDuncan closed 1 month ago

BHSDuncan commented 3 months ago

The k8s-install.yaml file used in playbooks/ makes use of ansible_user_dir, but when doing a remote install, unless the username used for the remote is the same on the localhost (i.e. is already a user on the local system from which the playbook is kicked off), the playbook will fail with a permissions issue around line 276.

I verified this by creating a user on a local machine that kicks off the playbook (in fact, just a user dir in /home/ with the appropriate name and setting r/w permissions temporarily). Uninstalling and then re-running the playbook then works.

It looks like the change was done last year sometime: https://github.com/NVIDIA/cloud-native-stack/commit/55bdd53be90398b23dff1da95681d6afc88bab44

angudadevops commented 3 months ago

@BHSDuncan we understand your concern, we used to have ubuntu/nvidia user all the time because of that we didn't this issue. I will verify with different user and push the fix and let you know.

The reason being changed from /tmp to ansible_user_dir is we saw some issues when we try to use /tmp like permissions.

angudadevops commented 3 months ago

@BHSDuncan I have changed to /tmp and tested with different user on remote machines. works for me, please pull the latest and let us know if you still hit any issues.

Thanks

angudadevops commented 1 month ago

@BHSDuncan closing this issue as it's fixed