I found that the litgpt evaluate command ignores the provided checkpoint dir and silently downloads the model from HF. Since the download speeds in Studios is so fast, I didn't notice this. The only hint that this is happening is when I saw #1349 but I interpreted this first as it just wanting to download a missing config file. Later when I looked at the benchmark numbers and saw that LoRA, QLoRA and full finetuning returned all had the same eval benchmark numbers, I was led down this rabbit hole.
The fix is to correctly pass the pretrained checkpoint file to the HFLM class.
Tied to this is the problem that the huggingface state dict loader forces weights_only=True, which our checkpoints don't support because they are saved using pickle in the incremental saver. So I had to also include a workaround for this.
Fixes #1349
I found that the
litgpt evaluate
command ignores the provided checkpoint dir and silently downloads the model from HF. Since the download speeds in Studios is so fast, I didn't notice this. The only hint that this is happening is when I saw #1349 but I interpreted this first as it just wanting to download a missing config file. Later when I looked at the benchmark numbers and saw that LoRA, QLoRA and full finetuning returned all had the same eval benchmark numbers, I was led down this rabbit hole.The fix is to correctly pass the pretrained checkpoint file to the HFLM class. Tied to this is the problem that the huggingface state dict loader forces
weights_only=True
, which our checkpoints don't support because they are saved using pickle in the incremental saver. So I had to also include a workaround for this.