Closed matt-gardner closed 7 years ago
From the exploration done in #355, I'm reasonably confident this isn't actually an issue, there was just some crazy problem with using switch
inside the loss function for some reason. It'd be nice to understand why not, but it's not a P0 bug. I'm closing this one for now.
It sure seems like computing gradients through a
switch
doesn't work. At least it definitely didn't whenswitch
was used in a loss function. We need to figure out if we're actually getting correct gradients for other places where we useswitch
, and remove the use ofswitch
if we're not.