Closed gabcoh closed 3 years ago
A very quick look with callgrind suggests that the bad performance is indeed due to the model and the blas implementation. This suggests it is definitely worth while to investigate building pytorch with a faster blas. Would be worth looking into what blas library the default pytorch implementation ships with.
If so maybe look into using a better blas and lapack