How exactly are we supposed to answer question of Ex.5:
Try optimizing for a few different targets with Newton’s method. How many steps does Newton’s method usually take to converge for this problem? Why?Why it takes that amount of steps or why it takes less steps than gradient descent method?
Thank you in advance for your help!