codefuse-ai / FasterTransformer4CodeFuse

High-performance LLM inference based on our optimized version of FastTransfomer
Other
123 stars 9 forks source link

编译失败:c10d/ProcessGroupNCCL.hpp:No such file or directory #3

Closed FoxxComz closed 11 months ago

FoxxComz commented 12 months ago

Branch/Tag/Commit

main

Docker Image Version

registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-cuda11.8.0-py38-torch2.0.1-tf2.13.0-1.9.1

GPU name

A100

CUDA Driver

525.125.06

Reproduced Steps

按照官方步骤编译,提示c10d/ProcessGroupNCCL.hpp头文件缺失,搜索相关资料发现该头文件在pytorch1.13版本后路径已经切换为torch/csrc/distributed/c10d/ProcessGroupNCCL.hpp,但无论是官方使用的环境nvcr.io/nvidia/pytorch:22.09-py3还是我使用的modelscope提供环境,pytorch版本均高于1.13。请问是否存在兼容性问题?
zhang-ge-hao commented 11 months ago

Thanks for your response!

It looks like you didn't reproduce it using the recommended docker image. Please align your environment with nvcr.io/nvidia/pytorch:22.09-py3 and try another time.

If you have any problems when reproducing this repository using a suitable docker image, please feel free to re-open this issue.