allenai / allennlp

An open-source NLP research library, built on PyTorch.
http://www.allennlp.org
Apache License 2.0
11.75k stars 2.25k forks source link

Fine-tune behavior #1164

Closed dheerajrajagopal closed 6 years ago

dheerajrajagopal commented 6 years ago

Hi,

I had a question about fine-tune feature. I trained the SRL model on the conll 2012 dataset and I want to fine-tune the model on a smaller dataset which has a different label-space.

I tried this command:

python -m allennlp.run fine-tune -m conll2012_saved_model -c new_dataset.json -s new_serialization_dir --include-package folder_with_custom_model

The only difference in new_dataset.json is the training, dev and test data. The rest of the models (and parameters) remain the same. But when I run it, I get this error. [B-Theme is the label from the new dataset which I am trying to fine-tune my weights on]

2018-04-30 23:28:01,438 - INFO - allennlp.training.trainer - Training 2018-04-30 23:28:01,439 - ERROR - allennlp.data.vocabulary - Namespace: labels 2018-04-30 23:28:01,439 - ERROR - allennlp.data.vocabulary - Token: B-Theme Traceback (most recent call last): File "/home/dheeraj/anaconda2/envs/py36/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/dheeraj/anaconda2/envs/py36/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/dheeraj/anaconda2/envs/py36/lib/python3.6/site-packages/allennlp/run.py", line 18, in main(prog="python -m allennlp.run") File "/home/dheeraj/anaconda2/envs/py36/lib/python3.6/site-packages/allennlp/commands/init.py", line 67, in main args.func(args) File "/home/dheeraj/anaconda2/envs/py36/lib/python3.6/site-packages/allennlp/commands/fine_tune.py", line 85, in fine_tune_model_from_args file_friendly_logging=args.file_friendly_logging) File "/home/dheeraj/anaconda2/envs/py36/lib/python3.6/site-packages/allennlp/commands/fine_tune.py", line 120, in fine_tune_model_from_file_paths file_friendly_logging=file_friendly_logging) File "/home/dheeraj/anaconda2/envs/py36/lib/python3.6/site-packages/allennlp/commands/fine_tune.py", line 202, in fine_tune_model metrics = trainer.train() File "/home/dheeraj/anaconda2/envs/py36/lib/python3.6/site-packages/allennlp/training/trainer.py", line 649, in train train_metrics = self._train_epoch(epoch) File "/home/dheeraj/anaconda2/envs/py36/lib/python3.6/site-packages/allennlp/training/trainer.py", line 422, in _train_epoch for batch in train_generator_tqdm: File "/home/dheeraj/anaconda2/envs/py36/lib/python3.6/site-packages/tqdm/_tqdm.py", line 955, in iter for obj in iterable: File "/home/dheeraj/anaconda2/envs/py36/lib/python3.6/site-packages/allennlp/data/iterators/data_iterator.py", line 59, in call yield from self._yield_one_epoch(instances, shuffle, cuda_device, for_training) File "/home/dheeraj/anaconda2/envs/py36/lib/python3.6/site-packages/allennlp/data/iterators/data_iterator.py", line 71, in _yield_one_epoch for batch in batches: File "/home/dheeraj/anaconda2/envs/py36/lib/python3.6/site-packages/allennlp/data/iterators/bucket_iterator.py", line 81, in _create_batches self._padding_noise) File "/home/dheeraj/anaconda2/envs/py36/lib/python3.6/site-packages/allennlp/data/iterators/bucket_iterator.py", line 112, in _sort_by_padding instance.index_fields(self.vocab) File "/home/dheeraj/anaconda2/envs/py36/lib/python3.6/site-packages/allennlp/data/instance.py", line 50, in index_fields field.index(vocab) File "/home/dheeraj/anaconda2/envs/py36/lib/python3.6/site-packages/allennlp/data/fields/sequence_label_field.py", line 88, in index for label in self.labels] File "/home/dheeraj/anaconda2/envs/py36/lib/python3.6/site-packages/allennlp/data/fields/sequence_label_field.py", line 88, in for label in self.labels] File "/home/dheeraj/anaconda2/envs/py36/lib/python3.6/site-packages/allennlp/data/vocabulary.py", line 408, in get_token_index return self._token_to_index[namespace][self._oov_token] KeyError: '@@UNKNOWN@@'

Am I missing something ?

matt-gardner commented 6 years ago

The fine-tune command assumes the input/output spec of your model is exactly the same on both datasets, including vocabulary sizes. You have more output labels in your second dataset, and your output label namespace doesn't allow OOV tokens (which is correct - only in very rare circumstances would you really want to be able to predict OOV in this kind of model).

I'm assuming you're wanting to leverage what the model learned about other tags to transfer to this new, related tag set? There are two options that I can see for how to handle this with our current code:

  1. Manually specify the vocabulary for the labels when you first train your model, including all labels that you might ever want to use (like B-Theme). This should make it so that your output classification layer (or whatever you have that predicts tags) will have capacity to predict all tags you might ever want to predict, and some of them just won't get trained on your initial training run. You can then use fine-tune on a new dataset with the new tags, and things should just work. The problem here is that you actually will train the representations for B-Theme and other unused tags, because the softmax will push the probability of selecting those tags down, so training might be slower on the new dataset than it otherwise might be.

  2. Make it so that you can initialize your model from an already-trained version, including a partial final tag softmax. Then you just copy everything over, initialize the expanded parts of the tag prediction layers to something reasonable, and use train again. This gets around the funny training issue from the first option, but it's a whole lot more work.

dheerajrajagopal commented 6 years ago

Hi Matt,

Thanks for the response. Ideally, I would want to copy the weights of all other layers except softmax and re-train the network but I'll stick to the option 1 for now.

I am closing this for now, thanks again.

matt-gardner commented 6 years ago

Good luck! You can see an example of option two here: https://github.com/allenai/allennlp/blob/a6d4b1b7c1d5543e438e010d84a6960195ee45bc/allennlp/models/semantic_parsing/nlvr/nlvr_coverage_semantic_parser.py#L116-L119

tahseen09 commented 5 years ago

@matt-gardner Hey if I want to fine-tune a model by passing parameters through a function. Which function I shall use in the allenNLP GitHub source code?

matt-gardner commented 5 years ago

@tahseen09 I'm not sure what you mean. Can you be more specific?

tahseen09 commented 5 years ago

@matt-gardner I don't want to fine-tune using the command but by passing parameters in the fine_tune.py file. Which function in the file should I use?

matt-gardner commented 5 years ago

Whichever one best suits your needs - there are only a few functions in that file, just use the one that takes the arguments that you want to pass.

iamsaurabhc commented 4 years ago

The fine-tune command assumes the input/output spec of your model is exactly the same on both datasets, including vocabulary sizes. You have more output labels in your second dataset, and your output label namespace doesn't allow OOV tokens (which is correct - only in very rare circumstances would you really want to be able to predict OOV in this kind of model).

I'm assuming you're wanting to leverage what the model learned about other tags to transfer to this new, related tag set? There are two options that I can see for how to handle this with our current code:

  1. Manually specify the vocabulary for the labels when you first train your model, including all labels that you might ever want to use (like B-Theme). This should make it so that your output classification layer (or whatever you have that predicts tags) will have capacity to predict all tags you might ever want to predict, and some of them just won't get trained on your initial training run. You can then use fine-tune on a new dataset with the new tags, and things should just work. The problem here is that you actually will train the representations for B-Theme and other unused tags, because the softmax will push the probability of selecting those tags down, so training might be slower on the new dataset than it otherwise might be.
  2. Make it so that you can initialize your model from an already-trained version, including a partial final tag softmax. Then you just copy everything over, initialize the expanded parts of the tag prediction layers to something reasonable, and use train again. This gets around the funny training issue from the first option, but it's a whole lot more work.

Hey @matt-gardner , for v1.0.0 how can we use this feature within the pipeline itself instead of running a separate command? Also, I do not see fine_tune.py in the commands folder either. Any help would be appreciated! :)