Open ankitesh97 opened 4 years ago
I was wondering why did you use BFGS optimization instead of inbuilt ADAM/Gradient descent optimization method in pytorch?
BFGS with line search converges quicker. ADAM and SGD with a fixed learning rate are not stable.
I was wondering why did you use BFGS optimization instead of inbuilt ADAM/Gradient descent optimization method in pytorch?