X-LANCE / SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model
MIT License
576 stars 52 forks source link

Currently we use a single GPU for decoding. We have the plan to support Multi-GPU decoding and the script is on the way. #119

Open Learneducn opened 3 months ago

Learneducn commented 3 months ago
          Currently we use a single GPU for decoding. We have the plan to support Multi-GPU decoding and the script is on the way.

Originally posted by @ddlBoJack in https://github.com/X-LANCE/SLAM-LLM/issues/100#issuecomment-2151660523

Learneducn commented 3 months ago

Hello, excuse me. When I run the inference and training scripts, I specified the CUDA ID, but it always defaulted to on CUDA=0. How to solve this?

Learneducn commented 3 months ago

Hello, excuse me. When I ran the inference and training scripts, I specified the CUDA ID, but it always defaulted to cuda=0. How should I solve this? In short, an error was reported: torch.distributed.elastic .mutiprocessing.errors.ChildFailedError:torch.distributed.elastic.mutiprocessing.errors.ChildFailedError

ddlBoJack commented 3 months ago

Have you solved it? I think this may relate to a local problem with your GPU config.

fclearner commented 3 months ago

I believe there is a straightforward implementation for multi-GPU support. You can wrap the existing script with an outer script that handles the splitting of the test set and passes the GPU IDs accordingly. This approach is similar to what FunASR did previously.

Learneducn commented 3 months ago

我相信多 GPU 支持有一个简单的实现。您可以使用外部脚本包装现有脚本,该脚本处理测试集的拆分并相应地传递 GPU ID。这种方法类似于 FunASR 之前所做的。 Thank you very much. The problem about specifying a certain card for testing has been solved. I have encountered another problem now. I directly used the SLAM framework to fine-tune the inference results. Why haven’t I directly used the whisper open source model to test the results?