Open Learneducn opened 3 months ago
Hello, excuse me. When I run the inference and training scripts, I specified the CUDA ID, but it always defaulted to on CUDA=0. How to solve this?
Hello, excuse me. When I ran the inference and training scripts, I specified the CUDA ID, but it always defaulted to cuda=0. How should I solve this? In short, an error was reported: torch.distributed.elastic .mutiprocessing.errors.ChildFailedError:torch.distributed.elastic.mutiprocessing.errors.ChildFailedError
Have you solved it? I think this may relate to a local problem with your GPU config.
I believe there is a straightforward implementation for multi-GPU support. You can wrap the existing script with an outer script that handles the splitting of the test set and passes the GPU IDs accordingly. This approach is similar to what FunASR did previously.
我相信多 GPU 支持有一个简单的实现。您可以使用外部脚本包装现有脚本,该脚本处理测试集的拆分并相应地传递 GPU ID。这种方法类似于 FunASR 之前所做的。 Thank you very much. The problem about specifying a certain card for testing has been solved. I have encountered another problem now. I directly used the SLAM framework to fine-tune the inference results. Why haven’t I directly used the whisper open source model to test the results?
Originally posted by @ddlBoJack in https://github.com/X-LANCE/SLAM-LLM/issues/100#issuecomment-2151660523