The double-casting to tensor seemed to be making it not converge (maybe cutting off grad propagation?). Just doing it once at the top level worked fine. Should investigate and if that's a consistent thing maybe add to the docs so folks know to look out for that.
The double-casting to tensor seemed to be making it not converge (maybe cutting off grad propagation?). Just doing it once at the top level worked fine. Should investigate and if that's a consistent thing maybe add to the docs so folks know to look out for that.