Evaluating a specific checkpoint

allenai / RL4LMs

A modular RL library to fine-tune language models to human preferences

https://rl4lms.apps.allenai.org/

Apache License 2.0

2.13k stars 191 forks source link

Evaluating a specific checkpoint #32

Open lovodkin93 opened 1 year ago

lovodkin93 commented 1 year ago

hey, first of all thank you very much for this amazing library! I was using it to finetune a model, and I am interested in evaluating one of the saved checkpoints on my testset. Is there an easy way to do it? Thanks.

rajcscw commented 1 year ago

If you want to evaluate a saved checkpoint, it is tricky. Evaluating the language model (which is saved at the end of the training) is easier and can be done like any other auto model on HF. Question, is do you want to evaluate via RL4LMs or externally?

lovodkin93 commented 1 year ago

@rajcscw I was thinking of doing it via RL4LMs, but if it is easier to do it externally, I don't mind.

rajcscw commented 1 year ago

So the easiest way to do is load up the entire checkpoint (which includes trainer, policy state etc) and run the evaluation via train_text_generation.py. This can be done as follows:

Create a datapool of your evaluation dataset
Register it and adapt the config accordingly
In train_text_generation.py set the base_path_to_store_results, project name, experiment_name so that it loads the latest checkpoint correctly. Note that: checkpoints folder must be present in base_path_to_store_results/project_name/experiment_name.
Set train_evaluation/n_iters to 0 in the config file so that there is no further training of the model.

After doing this, on running train_text_generation.py, you will see the metrics on console and also saved on the jsons in the experiment folder.

lovodkin93 commented 1 year ago

@rajcscw great, I will give it a go. Thank you!

avacaondata commented 1 year ago

@rajcscw What would be the way for doing this with huggingface's transformers library? I mean, if it's easy to load models from transformers to run do reinforcement learning, then it should not be so tricky to load the saved checkpoints as transformer models, with .from_pretrained()... I would expect that unused weights are ignored (value head etc) and that the rest of the weights belonging to the original model are indeed loaded. However, I cannot make the from_pretrained() call work with the saved checkpoints. What am I missing? Thank you in advance :heart: :smile: