Open nightmareisme opened 2 years ago
Are the weights trained by 2 gpus different from those trained by 8 gpus in downstream tasks?? Because the overall batch size is different. Hope to get a reply.
Are the weights trained by 2 gpus different from those trained by 8 gpus in downstream tasks?? Because the overall batch size is different. Hope to get a reply.