Closed Yufei-Gu-451 closed 1 hour ago
Hi! did you overload the _model_call
method? :
https://github.com/EleutherAI/lm-evaluation-harness/blob/5680a2e6b5cf1a1621d8ff68d3d0e83e8b2731d3/lm_eval/models/huggingface.py#L846
We also set it to eval mode here: https://github.com/EleutherAI/lm-evaluation-harness/blob/5680a2e6b5cf1a1621d8ff68d3d0e83e8b2731d3/lm_eval/models/huggingface.py#L202-L204
Thank you for your kind response. My issue is fixed by enabling torch.grad() in the _model_call method (simply comment out lines 202 to 204 is not enough).
I am working on understanding the model gradients in LM inference. I attempted to capture the gradient in _loglikelihood_tokens methods with modified Huggingface Model class and loaded AutoModelForCausalLM from pretrained huggingface checkpoints.
However, all model parameters have no grad_fn (printed None). I attempted on Llama and Mistral and get the same results.
Is there anyway I can load some AutoModelForCausalLM models with enabled grad_fn?