NLPJCL / RAG-Retrieval

Unify Efficient Fine-tuning of RAG Retrieval, including Embedding, ColBERT, ReRanker.
MIT License
506 stars 45 forks source link

RuntimeError: CUDA error: invalid device ordinal #27

Closed uestc-huangyw closed 3 months ago

uestc-huangyw commented 4 months ago

单机单卡训练,遇到如下错误

[rank5]: RuntimeError: CUDA error: invalid device ordinal
[rank5]: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[rank5]: For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
[rank5]: Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
NLPJCL commented 4 months ago

请提供下配置文件default_fsdp.yaml的详情,以及执行参数。

uestc-huangyw commented 4 months ago

感谢您的回复,可以顺利运行了,感谢您提供的统一微调方式

NLPJCL commented 4 months ago

很开心能帮助到你~