Closed yuanxuanS closed 9 months ago
Hi @yuanxuanS !
Actually, we did not train on 4 GPUs simultaneously. Only for the larger AM-XL version we trained on 2x3090 at the time (the initial version of RL4CO, which had some inefficiencies that have now been solved). For TSP/CVRP, the training time is less than 7 hours for 50 nodes, 1,280,000 samples/epoch (train_data_size
), batch size 512, and 100 epochs on a single 3090 (note we are using mixed-precision from Lightning + FlashAttention), which is faster than the original Kool's implementation on the same machine.
Could you tell us more about your setting? Which problem and hyperparameters did you use?
Closing as stale. Feel free to re-open @yuanxuanS if you find any issue with training time!
Could you please tell me the time of training of every experiment? I saw you trained them on 4 GPUs simutanously. When I transfer it to another problem, I need train for more than 3 days on one Tesla p100.