Closed zwep closed 4 years ago
Yeah, I think so. The current implementation is really pushing out the work out of the box and I've made dirty on it. It basically forces the other layer don't update by setting requires_grad False. Performance is awful in my implementation.
hi @zwep, please feel free to reopen this ticket or create new for any further question, thanks!
Hi,
I was wondering something when I was reading you code. Since we are not using backprop in the usual sense, but updating layers on a layer-by-layer basis... can we turn off the standard gradient calculation that is done within torch?
This should, of course, save a lot of time.. and it seems to me that that should be the case. But I am not certain at this point.