Closed glistering96 closed 1 year ago
On TSP N=20, for 5000 epochs, qkv_dim =64, 4 encoder layer. Trained on 256 episodes
Run on rtx 3060 12GB
torch: 12.48 min my: 13.88 min
Found out that the integrated implementation from pytorch takes a lot of memory. We revert all changes related to torch 2.0 attention.
Clsoing this issue as it contains some wrong experiments settings and results with wrong implementation.
On TSP N=100, for 250 epoch, qkv_dim =64, 4 encoder layer.
Run on rtx 3060 12GB
Trained on 256 episodes
torch: 3.95 min 6.9 GB my: 3.88 min 6.9 GB