Closed josharnoldjosh closed 4 years ago
Just realized I'm not calling the optimizer, is it safe to do blender.init_optim(opt)
? then do blender.optimizer.zero_grad()
and blender.optimizer.step()
?
Yeah I think you've got it all.
A few bits:
__init__
.It's worth noting that if you just do:
blender.observe(some_message_with_labels)
blender.act()
that does an sgd step on its own.
I would generally recommend you add your dataset as a teacher, and then use the train model script, overriding compute_loss to add in your reward term, and don't try to write your own loop. We have a standardized training loop and make some assumptions about things happening in that standardized training loop
Ah it is complicated a bit by the fact that you're trying to do RL, not supervised learning. Setting a labels field turns it into supervised learning. if you look at TorchAgent.batch_act, you'll see that we explicitly call train_step or eval_step, and take a look at TorchGeneratorAgent.train_step to see what's going on there.
Thanks for the feedback!
I realized for my purpose I can just augment the loss function and I actually don't need reinforcement learning at all.
Just to confirm, all I need to do is the following to fine-tune with custom loss function:
To confirm, I don't need to handle any of the .zero_grad()
or .update_params()
myself? This is all done in the train script?
Also, the only other issue I've noticed is blender.optimizer
is None
when I init my subclass, do I need to call init_optim
myself? Or do I just need to make sure I pass in the correct opt
when initializing the subclass of TransformerGeneratorAgent
and the optimizer
will automatically be initialized? Currently I try to solve this by doing in the __init__(self, ..)
function:
self.init_optim(
[p for p in self.model.parameters() if p.requires_grad],
optim_states=blender_opt.get('optimizer'),
saved_optim_type=blender_opt.get('optimizer_type'),
)
self.build_lr_scheduler(blender_opt, hard_reset=True)
One last question too! During the train loop, each time compute_loss
is called, can I get my Blender model to generate a response within the compute_loss
function? If so, will it have updated weights (so over time blender will be generated different responses)? My plan was to do something along these lines
def compute_loss(self, batch, return_output=False):
# do stuff
generated_response_from_blender = self._generate(batch, ...)
loss_penalty = my_function(generated_response_from_blender)
# do stuff
Thanks so much for your help!
Correct just use the train_model
on your dataset with your overridden compute_loss. Compute loss has those two return values, so make sure you're respecting that.
As a very concrete example, check out our implementation of the Unlikelihood agent:
Note that we have two different losses, depending on some information in the batch (the reward field), and we add them both and return at the end.
We trained this with something like:
parlai train -m projects.dialogue_unlikelihood.agents:TransformerUnlikelihoodAgent -t mytask ...[other learning parameters]
Note it's TransformerUnlikelihoodAgent because we implemented the custom loss as a mixin:
@stephenroller Thanks for all of the detailed help so far! I think I've gotten the custom loss function working with the train loop.
I just had a few quick questions,
Currently, I run the train loop like so:
parlai train_model
-t blended_skill_talk,wizard_of_wikipedia,convai2:normalized
-m parlai.agents.blender.blender:Blender
-df parlai/agents/blender/opt/model_dict.opt
I pass in the df
argument, which has a path to a opt
dictionary, and also the m
argument and I pass in my model class. My question was, firstly, is it okay to pass in the df
argument? I didn't see it in the documentation but I just assumed it worked since I was getting the following error: RuntimeError: WARNING: For train_model, please specify either a model_file or dict_file.
Secondly, When is the model
file outputted? Currently, only model.checkpoints
are being outputted after setting --save-every-n-secs 30
. Do all epochs have to finish for the final model file to be saved?
Lastly, how do I set load_state_dict to use strict=False ? I see I can override the function load_state_dict
, but I'm not sure where I can actually set strict=False
? Thanks!
Thanks so much for all your help!
Use the same arguments as a regular fine tune, except change -m
to be your blender agent.
from https://parl.ai/projects/recipes/
parlai train_model -t blended_skill_talk,wizard_of_wikipedia,convai2:normalized,empathetic_dialogues --multitask-weights 1,3,3,3 -veps 0.25 --attention-dropout 0.0 --batchsize 128 --model transformer/generator --embedding-size 2560 --ffn-size 10240 --variant prelayernorm --n-heads 32 --n-positions 128 --n-encoder-layers 2 --n-decoder-layers 24 --history-add-global-end-token end --delimiter ' ' --dict-tokenizer bytelevelbpe --dropout 0.1 --fp16 True --init-model zoo:blender/reddit_3B/model --dict-file zoo:blender/reddit_3B/model.dict --label-truncate 128 --log_every_n_secs 10 -lr 7e-06 --lr-scheduler reduceonplateau --lr-scheduler-patience 3 --optimizer adam --relu-dropout 0.0 --activation gelu --model-parallel true --save-after-valid True --text-truncate 128 --truncate 128 --warmup_updates 100 --fp16-impl mem_efficient --update-freq 2 --gradient-clip 0.1 --skip-generation True -vp 10 -vmt ppl -vmm min --model-file /tmp/test_train_27B
just add your -m argument to that. Also you might need to adjust batchsize to be what's right for your computers, and any other parameters you think should change.
Thanks a lot, and then after the model has finished fine-tuning, would I just pass in the same --model-file
argument I used for fine-tuning to be able to speak/interact with the model?
For example like?
python parlai/scripts/safe_interactive.py -t blended_skill_talk -mf /tmp/test_train_90M.checkpoint
Thanks!
Exactly
Next week the whole team is dedicating a day to writing tutorials and docs. Are there requests you have?
Thanks so much for all of the support!
Hmmm, I cannot think of anything off the top of my head, but perhaps one idea could be to try to provide a few basic "in-depth" tutorials that go from start-to-finish, all the way from cloning the repository to achieving a final result? (Kind of like, Medium articles)
I think the documentation covers all of the topics but maybe new users might not know how to "fit all of the pieces together" because each section of the documentation seems to assume a degree of prior knowledge of the platform? What do you think?
Thanks again for all of the help! The ParlAI platform is really great.
Hello,
My main goal is to fine-tune blender using Reinforcement Learning, and I was wondering if my steps are correct?
1. Subclass & load Blender I define the options like so:
2. Set blender to "train" model
3. Manually define observations In this case, would the
text
key be fromblender.act()
? and thelabel
key is my desired/expected answer?4. Convert observations to batch
5. Calculate loss!
6. Apply reward to the loss? Now, could I apply the reward to the loss based on some additional logic..? E.g, I have a function that, given the response from blender, returns a reward value...?
7. Call backward
Now, once I call backward and pass in the loss I multiplied with a scalar, will blender have updated its weights? Do I just repeat this larger batch sizes many times until my desired loss converges? Can I expect blender's outputs to change by training like this? Or, will this not actually make a difference to blender's weights at all..?
Assuming this does train blender, my last question is to do with the
Batch
tuple. I noticed there are also the options to havecandidates
andcandidate_vecs
for example. I was wondering, what is the difference between providingcandidates
etc vs. thelabels
key in theMessage
object that I use when doing the batchify?Thank you so much for your time!