Open sjfleming opened 6 months ago
Right now we assess early stopping using the total loss, and we have a hard stop at 1k epochs.
ARD takes a long time to converge... longer than most of the model.
We might consider setting up a convergence criterion that has to do with ARD, since that seems to be the slowest thing to converge.
Perhaps monitor alpha_q
alpha_q
Right now we assess early stopping using the total loss, and we have a hard stop at 1k epochs.
ARD takes a long time to converge... longer than most of the model.
We might consider setting up a convergence criterion that has to do with ARD, since that seems to be the slowest thing to converge.