cs231n / cs231n.github.io

Public facing notes page
MIT License
10.12k stars 4.06k forks source link

'gradient of' and 'gradient with respect to' are used interchangeably #226

Closed ShoufaChen closed 4 years ago

ShoufaChen commented 4 years ago

Thank you very much for the cs231n, which helps me a lot in understanding deep learning.

One thing I'm confused about is that can 'gradient of' and 'gradient with respect to' be used Interchangeably?

in the https://cs231n.github.io/optimization-2/ First,

That is, for example instead of dfdq we would simply write dq, and always assume that the gradient is with respect to the final output

I think here the gradient is of the final output may be right.

Second,

Every gate in a circuit diagram gets some inputs and can right away compute two things: 1. its output value and 2. the local gradient of its inputs with respect to its output value

2. the local gradient of its output value with respect to its inputs

Did I miss something?

brentyi commented 4 years ago

Thanks for pointing this out!

Seems we played a bit loosey-goosey with the terminology here -- these terms are generally not interchangeable. df/dx should be the derivative/gradient of f with respect to x. I just pushed some wording tweaks to clarify.