Open bemi127 opened 6 months ago
I'm no expert, but I believe because none of the models are actually trained in DetectGPT's experiments, there is not necessarily a need to put them in eval
mode.
But I agree it is still good practice to do .eval()
explicitly even when you're just loading pre trained weights
It is correct that the base models are not explicitly retrained or finetuned. However the perplexities in eval mode will be more accurate due to each attention layer having full context and not being influenced by dropout.
Good thing is that we can try it out and see what happens
Why are the models not set to eval() mode before running inference? If my understanding is correct, this means that dropout, etc. will be applied when perturbations are being generated and log likelihood is estimated.