y_hat is of dimension [N,1] and snapshot.y is of dimension [N]. If subtracted without first removing the trailing dimension of y_hat, then the operation is broadcast to [N,N]. Instead a N-dimensional vector is expected which is then used to compute the MSE.
I've used squeeze() to remove the trailing dimension and computed the training and evaluation.
y_hat is of dimension [N,1] and snapshot.y is of dimension [N]. If subtracted without first removing the trailing dimension of y_hat, then the operation is broadcast to [N,N]. Instead a N-dimensional vector is expected which is then used to compute the MSE.
I've used squeeze() to remove the trailing dimension and computed the training and evaluation.