Open jolespin opened 3 years ago
I think @jakobnissen wrote a reply to a similar issue but can't find right now. He should chip in.
But we did find that the latent space changes over time even though the loss does not change significantly. Ie, we are, in principle, not interested how much the loss decreases but more the nature of the latent space and how it clusters. For instance, changing the optimiser to Ada Belief (https://arxiv.org/abs/2010.07468) instead of Adam decreased the loss overall, but performed worse in terms of clustering.
Good idea. This was already discussed in #57. For now, I've decreased the number of epochs for version 4.
Simon is right that the latent space continues to change even after loss has flattened out. But we could implement early stopping by tracking the latent space. Every 10 epochs, say, we could measure the latent representation of a few thousand contigs, and compare to the last measurement. If the latent representation hasn't changed much, we could stop the VAE.
I'd need to do a test run where the latent repr. is dumped every few epochs to see how it changes over time, though.
That is interesting how the clusters change even with slight changes in loss. Also good to know that there other things to consider besides loss with unsupervsied deep learning.
Is there any measure of the latent space you would be able to output as a single metric in the output of each epoch? Also, a timestamp for each epoch could also be helpful for users to have a rough estimate of how long the run might take.
I've noticed
Is there any interest in having
--early_stopping
and--delta_loss
parameters? The motivation for this is that the extra compute time necessary to get to 500 epochs isn't worth it since it starts to converge a lot earlier ~200 epochs. For example in this case, if we set--early_stopping 50
and --delta_loss 0.0004it would notice that the hasn't improved by
delta_lossin
early_stopping` epochs so it would cut the algorithm short and continue with the last best epoch up until that point.I think the most useful way to go about this is if the loss did not decrease by at least
delta_loss
inearly_stopping
iterations cumulatively then the algorithm can be cut short.This would be really helpful when running large datasets and, with what I'm doing, brute force hyperparameter tuning.