ai4co / rl4co

A PyTorch library for all things Reinforcement Learning (RL) for Combinatorial Optimization (CO)
https://rl4.co
MIT License
455 stars 84 forks source link

About the training time. #107

Closed yuanxuanS closed 9 months ago

yuanxuanS commented 11 months ago

Could you please tell me the time of training of every experiment? I saw you trained them on 4 GPUs simutanously. When I transfer it to another problem, I need train for more than 3 days on one Tesla p100.

fedebotu commented 11 months ago

Hi @yuanxuanS ! Actually, we did not train on 4 GPUs simultaneously. Only for the larger AM-XL version we trained on 2x3090 at the time (the initial version of RL4CO, which had some inefficiencies that have now been solved). For TSP/CVRP, the training time is less than 7 hours for 50 nodes, 1,280,000 samples/epoch (train_data_size), batch size 512, and 100 epochs on a single 3090 (note we are using mixed-precision from Lightning + FlashAttention), which is faster than the original Kool's implementation on the same machine.

Could you tell us more about your setting? Which problem and hyperparameters did you use?

fedebotu commented 9 months ago

Closing as stale. Feel free to re-open @yuanxuanS if you find any issue with training time!