The test case 17. SM-modelparallelv2, uses a custom pytorch binaries pytorch="2.2.0=sm_py3.10_cuda12.1_cudnn8.9.5_nccl_pt_2.2_tsm_2.3_cuda12.1_0 which declared dependency on aws-ofi-nccl >=1.7.1,<2.0. The expectation was that the aws-ofi-nccl package will be consumed from the AWS PyTorch conda channel (https://aws-pytorch-doc.com/).
The following package could not be installed
└─ pytorch ==2.2.0 sm_py3.10_cuda12.1_cudnn8.9.5_nccl_pt_2.2_tsm_2.3_cuda12.1_0 is not installable because it requires
└─ aws-ofi-nccl >=1.7.1,<2.0 , which does not exist (perhaps a missing channel).
The conda channel has been deprecated, as mentioned in deprecation annoucement, it is recommended for the team who built pytorch="2.2.0=sm_py3.10_cuda12.1_cudnn8.9.5_nccl_pt_2.2_tsm_2.3_cuda12.1_0 to rebuild this binary and remove dependency on aws-ofi-nccl >=1.7.1,<2.0.
The test case
17. SM-modelparallelv2
, uses a custom pytorch binariespytorch="2.2.0=sm_py3.10_cuda12.1_cudnn8.9.5_nccl_pt_2.2_tsm_2.3_cuda12.1_0
which declared dependency onaws-ofi-nccl >=1.7.1,<2.0
. The expectation was that theaws-ofi-nccl
package will be consumed from the AWS PyTorch conda channel (https://aws-pytorch-doc.com/).The conda channel has been deprecated, as mentioned in deprecation annoucement, it is recommended for the team who built
pytorch="2.2.0=sm_py3.10_cuda12.1_cudnn8.9.5_nccl_pt_2.2_tsm_2.3_cuda12.1_0
to rebuild this binary and remove dependency onaws-ofi-nccl >=1.7.1,<2.0
.