Open Jinpeng-Yu opened 11 months ago
Thanks for asking. I am sorry for the late response; We found multiple gpu training quicker than single in Nvidia quard 6000, but when you have one powerful GPU(like A200), it would be quicker since no need to take extra time on data transfer from different devices
Hi, thanks for your great work, I found out that you use the seedformer as the backbone and DP (Dataparallel) as the default speedup, I would like to know how much time the training process takes. Because I really found that on my training machine (4 GPUs) it takes more time to use DP than just only single GPU, I don't know if you have encountered this problem, do you have any suggestions, thanks