facebookresearch / ParlAI

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
https://parl.ai
MIT License
10.49k stars 2.1k forks source link

[Blender] Is this how you do a custom train loop for the Blender model? #3015

Closed josharnoldjosh closed 4 years ago

josharnoldjosh commented 4 years ago

Hello,

My main goal is to fine-tune blender using Reinforcement Learning, and I was wondering if my steps are correct?

1. Subclass & load Blender I define the options like so:

opt = {
        "no_cuda": True,
        "task": "internal:blended_skill_talk,wizard_of_wikipedia,convai2,empathetic_dialogues",
        "multitask_weights": [
            1.0,
            3.0,
            3.0,
            3.0
        ],
        "init_model": "./data/models/blender/blender_90M/model",
        "dict_file":"./data/models/blender/blender_90M/model.dict",
        "embedding_size": 512,
        "verbose":True,
        "n_layers": 8,
        "ffn_size": 2048,
        "dropout": 0.1,
        "n_heads": 16,
        "learn_positional_embeddings": True,
        "n_positions": 512,
        'variant': 'xlm',
        'activation': 'gelu',
        'skip_generation': True,
        'fp16': True,
        'text-truncate': 512,
        'label_truncate': 128,
        'dict_tokenizer': 'bpe',
        'dict_lower': True,
        'lr': 1e-06,
        'optimizer': 'adamax',
        'lr_scheduler': 'reduceonplateau',
        'gradient_clip': 0.1,
        'veps': 0.25,
        "betas": [
            0.9,
            0.999
        ],        
        "update_freq": 1,
        "attention_dropout": 0.0,
        "relu_dropout": 0.0,
        "skip_generation": False,
        'vp': 15,
        'stim': 60,
        'vme': 20000,
        'bs': 16,
        'vmt': 'ppl',
        'vmm': 'min',
        'save_after_valid': True,
        'model_file': '/tmp/test_train_90M',
        'datapath': './custom/data/',        
        'history_size': -1,
        'truncate': -1,
        'rank_candidates': False,
        'embeddings_scale': True,
        'output_scaling': 1.0,
        'embedding_type': 'random',
        'gpu': -1
    }
class Blender(TransformerGeneratorAgent):

    def __init__(self):        
        super().__init__(blender_opt, None)
blender = Blender()

2. Set blender to "train" model

blender.model.train()

3. Manually define observations In this case, would the text key be from blender.act()? and the label key is my desired/expected answer?

obs_labs = [
    Message(
        {
            'text': 'It\'s only a flesh wound.',
            'labels': ['It is a just a wound!'],
            'episode_done': True,
        }
    ),
]

obs_elabs = [
    Message(
        {
            'text': 'It\'s only a flesh wound.',
            'eval_labels': ['Yield!'],
            'episode_done': True,
        }
    ),
]

4. Convert observations to batch

obs_vecs = []

for obs_batch in (obs_labs, obs_elabs):
    for o in obs_batch:
        blender.history.reset()
        blender.history.update_history(o)
        obs_vecs.append(
            blender.vectorize(o, blender.history, add_start=False, add_end=False)
        )

# Here is the batch
batch = blender.batchify(obs_vecs)

5. Calculate loss!

loss = blender.compute_loss(batch, return_output=True)

6. Apply reward to the loss? Now, could I apply the reward to the loss based on some additional logic..? E.g, I have a function that, given the response from blender, returns a reward value...?

original_loss = loss[0]
new_loss = original_loss * torch.tensor(2.0) # multiply original loss by "reward scalar" for a random example?!

7. Call backward

blender.backward(new_loss)

Now, once I call backward and pass in the loss I multiplied with a scalar, will blender have updated its weights? Do I just repeat this larger batch sizes many times until my desired loss converges? Can I expect blender's outputs to change by training like this? Or, will this not actually make a difference to blender's weights at all..?

Assuming this does train blender, my last question is to do with the Batch tuple. I noticed there are also the options to have candidates and candidate_vecs for example. I was wondering, what is the difference between providing candidates etc vs. the labels key in the Message object that I use when doing the batchify?

Thank you so much for your time!

josharnoldjosh commented 4 years ago

Just realized I'm not calling the optimizer, is it safe to do blender.init_optim(opt) ? then do blender.optimizer.zero_grad() and blender.optimizer.step() ?

stephenroller commented 4 years ago

Yeah I think you've got it all.

A few bits:

It's worth noting that if you just do:

blender.observe(some_message_with_labels)
blender.act()

that does an sgd step on its own.

I would generally recommend you add your dataset as a teacher, and then use the train model script, overriding compute_loss to add in your reward term, and don't try to write your own loop. We have a standardized training loop and make some assumptions about things happening in that standardized training loop

stephenroller commented 4 years ago

Ah it is complicated a bit by the fact that you're trying to do RL, not supervised learning. Setting a labels field turns it into supervised learning. if you look at TorchAgent.batch_act, you'll see that we explicitly call train_step or eval_step, and take a look at TorchGeneratorAgent.train_step to see what's going on there.

josharnoldjosh commented 4 years ago

Thanks for the feedback!

I realized for my purpose I can just augment the loss function and I actually don't need reinforcement learning at all.

Just to confirm, all I need to do is the following to fine-tune with custom loss function:


To confirm, I don't need to handle any of the .zero_grad() or .update_params() myself? This is all done in the train script?


Also, the only other issue I've noticed is blender.optimizer is None when I init my subclass, do I need to call init_optim myself? Or do I just need to make sure I pass in the correct opt when initializing the subclass of TransformerGeneratorAgent and the optimizer will automatically be initialized? Currently I try to solve this by doing in the __init__(self, ..) function:

       self.init_optim(
            [p for p in self.model.parameters() if p.requires_grad],
            optim_states=blender_opt.get('optimizer'),
            saved_optim_type=blender_opt.get('optimizer_type'),
        )
        self.build_lr_scheduler(blender_opt, hard_reset=True)

One last question too! During the train loop, each time compute_loss is called, can I get my Blender model to generate a response within the compute_loss function? If so, will it have updated weights (so over time blender will be generated different responses)? My plan was to do something along these lines

def compute_loss(self, batch, return_output=False):
    # do stuff
    generated_response_from_blender = self._generate(batch, ...)
    loss_penalty = my_function(generated_response_from_blender)
    # do stuff

Thanks so much for your help!

stephenroller commented 4 years ago

Correct just use the train_model on your dataset with your overridden compute_loss. Compute loss has those two return values, so make sure you're respecting that.

As a very concrete example, check out our implementation of the Unlikelihood agent:

https://github.com/facebookresearch/ParlAI/blob/c7f4b64df1a706464c053f036fe2ff38600f4558/projects/dialogue_unlikelihood/agents.py#L49-L136

Note that we have two different losses, depending on some information in the batch (the reward field), and we add them both and return at the end.

We trained this with something like:

parlai train -m projects.dialogue_unlikelihood.agents:TransformerUnlikelihoodAgent -t mytask ...[other learning parameters]

Note it's TransformerUnlikelihoodAgent because we implemented the custom loss as a mixin:

https://github.com/facebookresearch/ParlAI/blob/c7f4b64df1a706464c053f036fe2ff38600f4558/projects/dialogue_unlikelihood/agents.py#L548-L557

josharnoldjosh commented 4 years ago

@stephenroller Thanks for all of the detailed help so far! I think I've gotten the custom loss function working with the train loop.


I just had a few quick questions,

Currently, I run the train loop like so:

parlai train_model
-t blended_skill_talk,wizard_of_wikipedia,convai2:normalized
-m parlai.agents.blender.blender:Blender
-df parlai/agents/blender/opt/model_dict.opt

I pass in the df argument, which has a path to a opt dictionary, and also the m argument and I pass in my model class. My question was, firstly, is it okay to pass in the df argument? I didn't see it in the documentation but I just assumed it worked since I was getting the following error: RuntimeError: WARNING: For train_model, please specify either a model_file or dict_file.

Secondly, When is the model file outputted? Currently, only model.checkpoints are being outputted after setting --save-every-n-secs 30. Do all epochs have to finish for the final model file to be saved?

Lastly, how do I set load_state_dict to use strict=False ? I see I can override the function load_state_dict, but I'm not sure where I can actually set strict=False? Thanks!

Thanks so much for all your help!

stephenroller commented 4 years ago

Use the same arguments as a regular fine tune, except change -m to be your blender agent.

from https://parl.ai/projects/recipes/

parlai train_model -t blended_skill_talk,wizard_of_wikipedia,convai2:normalized,empathetic_dialogues --multitask-weights 1,3,3,3 -veps 0.25 --attention-dropout 0.0 --batchsize 128 --model transformer/generator --embedding-size 2560 --ffn-size 10240 --variant prelayernorm --n-heads 32 --n-positions 128 --n-encoder-layers 2 --n-decoder-layers 24 --history-add-global-end-token end --delimiter '  ' --dict-tokenizer bytelevelbpe  --dropout 0.1 --fp16 True --init-model zoo:blender/reddit_3B/model --dict-file zoo:blender/reddit_3B/model.dict --label-truncate 128 --log_every_n_secs 10 -lr 7e-06 --lr-scheduler reduceonplateau --lr-scheduler-patience 3 --optimizer adam --relu-dropout 0.0 --activation gelu --model-parallel true --save-after-valid True --text-truncate 128 --truncate 128 --warmup_updates 100 --fp16-impl mem_efficient --update-freq 2 --gradient-clip 0.1 --skip-generation True -vp 10 -vmt ppl -vmm min --model-file /tmp/test_train_27B

just add your -m argument to that. Also you might need to adjust batchsize to be what's right for your computers, and any other parameters you think should change.

josharnoldjosh commented 4 years ago

Thanks a lot, and then after the model has finished fine-tuning, would I just pass in the same --model-file argument I used for fine-tuning to be able to speak/interact with the model?

For example like?

python parlai/scripts/safe_interactive.py -t blended_skill_talk -mf /tmp/test_train_90M.checkpoint

Thanks!

stephenroller commented 4 years ago

Exactly

stephenroller commented 4 years ago

Next week the whole team is dedicating a day to writing tutorials and docs. Are there requests you have?

josharnoldjosh commented 4 years ago

Thanks so much for all of the support!

Hmmm, I cannot think of anything off the top of my head, but perhaps one idea could be to try to provide a few basic "in-depth" tutorials that go from start-to-finish, all the way from cloning the repository to achieving a final result? (Kind of like, Medium articles)

I think the documentation covers all of the topics but maybe new users might not know how to "fit all of the pieces together" because each section of the documentation seems to assume a degree of prior knowledge of the platform? What do you think?

Thanks again for all of the help! The ParlAI platform is really great.