resume_from_checkpoint doesn't work

🐛 Describe the bug

We're trying to do iterative PPO, and want to use the resume_from_checkpoint feature here #482 by @maxreciprocate .

But when we tried to load it from the ckpt/best_checkpoint directory, I get a "no pytorch_model.bin" error, which when I check my directory, the best_checkpoint directory doesn't show a pytorch_model.bin file. (image below)

But the subdirectory hf_model has it, so I set resume_from_checkpoint=ckpt/best_checkpoint/hf_model. But then it gives me the error that is shown in #482 , or image below.

https://wandb.ai/andrew-siah/trlx/runs/i9o2eb0l/logs?workspace=user-andrew-siah

WeightsBiases output.log

Am I doing something wrong?

I can also verify that a previous run of trlx without using the resume_from_checkpoint feature works fine. So the issue is isolated to resume_from_checkpoint.

https://wandb.ai/andrew-siah/trlx/runs/a31x005e/logs?workspace=user-andrew-siah

Thanks.

Which trlX version are you using?

0.7.0

Additional system and package information

Ubuntu, Python 3.11.4

CarperAI / trlx