Closed dheerajrajagopal closed 6 years ago
The fine-tune
command assumes the input/output spec of your model is exactly the same on both datasets, including vocabulary sizes. You have more output labels in your second dataset, and your output label namespace doesn't allow OOV tokens (which is correct - only in very rare circumstances would you really want to be able to predict OOV in this kind of model).
I'm assuming you're wanting to leverage what the model learned about other tags to transfer to this new, related tag set? There are two options that I can see for how to handle this with our current code:
Manually specify the vocabulary for the labels when you first train your model, including all labels that you might ever want to use (like B-Theme
). This should make it so that your output classification layer (or whatever you have that predicts tags) will have capacity to predict all tags you might ever want to predict, and some of them just won't get trained on your initial training run. You can then use fine-tune
on a new dataset with the new tags, and things should just work. The problem here is that you actually will train the representations for B-Theme
and other unused tags, because the softmax will push the probability of selecting those tags down, so training might be slower on the new dataset than it otherwise might be.
Make it so that you can initialize your model from an already-trained version, including a partial final tag softmax. Then you just copy everything over, initialize the expanded parts of the tag prediction layers to something reasonable, and use train
again. This gets around the funny training issue from the first option, but it's a whole lot more work.
Hi Matt,
Thanks for the response. Ideally, I would want to copy the weights of all other layers except softmax and re-train the network but I'll stick to the option 1 for now.
I am closing this for now, thanks again.
Good luck! You can see an example of option two here: https://github.com/allenai/allennlp/blob/a6d4b1b7c1d5543e438e010d84a6960195ee45bc/allennlp/models/semantic_parsing/nlvr/nlvr_coverage_semantic_parser.py#L116-L119
@matt-gardner Hey if I want to fine-tune a model by passing parameters through a function. Which function I shall use in the allenNLP GitHub source code?
@tahseen09 I'm not sure what you mean. Can you be more specific?
@matt-gardner I don't want to fine-tune using the command but by passing parameters in the fine_tune.py file. Which function in the file should I use?
Whichever one best suits your needs - there are only a few functions in that file, just use the one that takes the arguments that you want to pass.
The
fine-tune
command assumes the input/output spec of your model is exactly the same on both datasets, including vocabulary sizes. You have more output labels in your second dataset, and your output label namespace doesn't allow OOV tokens (which is correct - only in very rare circumstances would you really want to be able to predict OOV in this kind of model).I'm assuming you're wanting to leverage what the model learned about other tags to transfer to this new, related tag set? There are two options that I can see for how to handle this with our current code:
- Manually specify the vocabulary for the labels when you first train your model, including all labels that you might ever want to use (like
B-Theme
). This should make it so that your output classification layer (or whatever you have that predicts tags) will have capacity to predict all tags you might ever want to predict, and some of them just won't get trained on your initial training run. You can then usefine-tune
on a new dataset with the new tags, and things should just work. The problem here is that you actually will train the representations forB-Theme
and other unused tags, because the softmax will push the probability of selecting those tags down, so training might be slower on the new dataset than it otherwise might be.- Make it so that you can initialize your model from an already-trained version, including a partial final tag softmax. Then you just copy everything over, initialize the expanded parts of the tag prediction layers to something reasonable, and use
train
again. This gets around the funny training issue from the first option, but it's a whole lot more work.
Hey @matt-gardner , for v1.0.0 how can we use this feature within the pipeline itself instead of running a separate command? Also, I do not see fine_tune.py
in the commands folder either. Any help would be appreciated! :)
Hi,
I had a question about fine-tune feature. I trained the SRL model on the conll 2012 dataset and I want to fine-tune the model on a smaller dataset which has a different label-space.
I tried this command:
python -m allennlp.run fine-tune -m conll2012_saved_model -c new_dataset.json -s new_serialization_dir --include-package folder_with_custom_model
The only difference in new_dataset.json is the training, dev and test data. The rest of the models (and parameters) remain the same. But when I run it, I get this error. [B-Theme is the label from the new dataset which I am trying to fine-tune my weights on]
Am I missing something ?