Closed liujiaqi-666 closed 3 months ago
Hi @liujiaqi-666,
The duration of training correlates directly with the number of epochs specified. With only 10,000 training data, SlimSAM-50 achieves satisfactory results post 20 epochs, equating to roughly 1~2 days on a single Titan RTX. Similarly, SlimSAM-77 reaches a satisfactory level of performance after 60 epochs, which translates to approximately three days on a single Titan RTX. Enhancing the dataset size and incrementing the epoch count further can lead to improvements in the performance of the compressed model.
It's possible to designate the computing device via the command line for multi-GPU training. For instance, setting “CUDA_VISIBLE_DEVICES=0,1" enables training across two GPUs. It's important to remove gradient accumulation when employing multi-GPU setups. Nonetheless, the current implementation of our multi-GPU training code exhibits some inefficiencies. Following the completion of numerous deadlines, we anticipate having the opportunity to refine and optimize the code for multi-GPU training. Thank you.
Also, how do I set up multi-GPU training?![1709608780187](https://github.com/czg1225/SlimSAM/assets/120233913/82e26ac0-cc33-4871-ba5e-492470ab10fa)
When I use the training code you provided, I find that only one GPU is working.