NVIDIA / deepops

Tools for building GPU clusters
BSD 3-Clause "New" or "Revised" License
1.25k stars 326 forks source link

golang install fails #1241

Closed arnoldas500 closed 1 year ago

arnoldas500 commented 1 year ago

On latest deepops version, 22.08. When running ansible-playbook -l slurm-cluster playbooks/slurm-cluster.yml with option slurm_cluster_install_singularity set to yes the script fails. The issue is that it is installing version "1.14.4" specified in deepops/roles/singularity_wrapper/defaults/main.yml but it is running the checksum for 1.16.15. I am not sure why, I have also tried updated the go version to 1.16.15 in deepops/roles/singularity_wrapper/defaults/main.yml but it still fails and looking under /opt/go it is creating folders for version "1.14.4".

Error below: TASK [gantsign.golang : download Go language SDK] ***** fatal: [xcitemain]: FAILED! => changed=true checksum_dest: null checksum_src: 34f546a9a76a3b0906db9cf3ff2f301d347df848 dest: /root/.ansible/tmp/downloads/go1.16.15.linux-amd64.tar.gz elapsed: 0 msg: The checksum for /root/.ansible/tmp/downloads/go1.16.15.linux-amd64.tar.gz did not match aed845e4185a0b2a3c3d5e1d0a35491702c55889192bb9c30e67a3de6849c067; it was 77c782a633186d78c384f972fb113a43c24be0234c42fef22c2d8c4c4c8e7475. src: /tmp/ansible-moduletmp-1668459890.2157743-00tr9fg8/tmp195nvbyg url: https://storage.googleapis.com/golang/go1.16.15.linux-amd64.tar.gz fatal: [appsvr]: FAILED! => changed=true checksum_dest: null checksumsrc: 34f546a9a76a3b0906db9cf3ff2f301d347df848 dest: /root/.ansible/tmp/downloads/go1.16.15.linux-amd64.tar.gz elapsed: 1 msg: The checksum for /root/.ansible/tmp/downloads/go1.16.15.linux-amd64.tar.gz did not match aed845e4185a0b2a3c3d5e1d0a35491702c55889192bb9c30e67a3de6849c067; it was 77c782a633186d78c384f972fb113a43c24be0234c42fef22c2d8c4c4c8e7475. src: /tmp/ansible-moduletmp-1668459890.154168-aoy0v23/tmp6cd82hsm url: https://storage.googleapis.com/golang/go1.16.15.linux-amd64.tar.gz

mfruhner commented 1 year ago

Hey, I ran into the same issue with two different golang versions during install. I looked at the golang download page and found that the sha hashes are completely different. I started updating them in the source code, but there were many different versions. So I just removed the line for checking the hash: sha256sum: '{{ golang_redis_sha256sum }}' Then the download will continue.

This is a workaround and not a solution, I know. All the hashes should be updated in the source code.

arnoldas500 commented 1 year ago

Hey, thank you for the quick response. Unfortunately still having issues.

in deepops/roles/galaxy/gantsign.golang/vars/../vars/versions/1.14.4.yml commented out (is this the correct location?):

# SHA256 sum for the redistributable package
# golang_redis_sha256sum: 'aed845e4185a0b2a3c3d5e1d0a35491702c55889192bb9c30e67a3de6849c067'

After running still getting an issue: fatal: [xcitemain]: FAILED! => msg: 'The conditional check ''golang_redis_sha256sum not in (None, '''')'' failed. The error was: error while evaluating conditional (golang_redis_sha256sum not in (None, '''')): ''golang_redis_sha256sum'' is undefined'

tried changing changed to checksum to what I found online in deepops/roles/galaxy/gantsign.golang/vars/../vars/versions/1.14.4.yml

# SHA256 sum for the redistributable package
# golang_redis_sha256sum: 'aed845e4185a0b2a3c3d5e1d0a35491702c55889192bb9c30e67a3de6849c067'
golang_redis_sha256sum: '77c782a633186d78c384f972fb113a43c24be0234c42fef22c2d8c4c4c8e7475'

but also still get error: error: fatal: [xcitemain]: FAILED! => changed=true checksum_dest: null checksum_src: fe68b09658d256e07630529fe07f9603faa3cae5 dest: /root/.ansible/tmp/downloads/go1.14.4.linux-amd64.tar.gz elapsed: 0 msg: The checksum for /root/.ansible/tmp/downloads/go1.14.4.linux-amd64.tar.gz did not match 77c782a633186d78c384f972fb113a43c24be0234c42fef22c2d8c4c4c8e7475; it was aed845e4185a0b2a3c3d5e1d0a35491702c55889192bb9c30e67a3de6849c067. src: /tmp/ansible-moduletmp-1668528188.8979194-bh3jnf5p/tmp8eh7q5oq url: https://storage.googleapis.com/golang/go1.14.4.linux-amd64.tar.gz

Any ideas what to try next?

mfruhner commented 1 year ago

Hey, I commented it out in file deepops/roles/galaxy/gantsing.golang/tasks/main.yml:30. There the get_url task will otherwise verify against the given sha. If sha256sum is not represent, it will skip any checks.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 60 days with no activity. Please update the issue or it will be closed in 7 days.