Open gusye1234 opened 1 year ago
Hi @gusye1234! thanks for using our code!
You definitely could try one GPU with --gradient_accumulation_steps
first, DSI with QG actually converges pretty fast with 8 gpus, so I think speed on one GPU is acceptable.
Hi. Note that the demo command will launch the training in 8 GPU. Have you tested running this task on a single GPU, and how long will it take?
I tried to follow up on this work but only got one GPU... So if the training speed is hard to bear, I might consider adding more GPUs.