The gist is that some measure of loss alongside the embedding would be tremendously useful, since the loss is a rough proxy of how good the model was creating the embedding.
It doesn't need to be the same loss as the training loss, a simpler one might be substantially faster than the actual loss we used for training (e.g. avoiding the DINO steps).
What is the simplest fastest loss we can generate alongside the embeddings?
This reopens https://github.com/Clay-foundation/model/issues/207
The gist is that some measure of loss alongside the embedding would be tremendously useful, since the loss is a rough proxy of how good the model was creating the embedding.
It doesn't need to be the same loss as the training loss, a simpler one might be substantially faster than the actual loss we used for training (e.g. avoiding the DINO steps).
What is the simplest fastest loss we can generate alongside the embeddings?