I also updated the example notebook with dataset_loss, moving of the scheduler step, etc.
An issue I ran into is that using dataset_loss for a vanilla cgnet involves specifying that there are no embeddings that should be passed to model.forward(). @nec4 how do you feel about this solution (and example notebook implementation)? It's definitely not scalable if we do more stuff with datasets than embeddings, but hard-coding in embedding options has been our philosophy with other stuff, so it seems consistent.
I also updated the example notebook with
dataset_loss
, moving of the scheduler step, etc.An issue I ran into is that using
dataset_loss
for a vanilla cgnet involves specifying that there are no embeddings that should be passed tomodel.forward()
. @nec4 how do you feel about this solution (and example notebook implementation)? It's definitely not scalable if we do more stuff with datasets than embeddings, but hard-coding in embedding options has been our philosophy with other stuff, so it seems consistent.