facebookresearch / access

Code to reproduce the experiments from the paper.
Other
101 stars 36 forks source link

Is it possible to perform transfer learning on this model? #14

Closed cece95 closed 4 years ago

cece95 commented 4 years ago

Hi! Is it possible to perform transfer learning on this model? I couldn't find any straightforward way to do it It may very well be that i missed something but any suggestion is appreaciated

Best Cesare

louismartin commented 4 years ago

Hi,

What do you want to do exactly? I guess it would be possible but not straightforward.

Best, Louis

cece95 commented 4 years ago

So basically I have new dataset for headlines simplification and I would like to take the pretrained model and fine tune it on the new dataset

louismartin commented 4 years ago

Ok then I guess that the way to do it would be to user the train.py script but modify the code so that fairseq loads a pretrain model instead of initializing a random seq2seq.

The train.py script calls fairseq_train_and_evaluate which in turns calls fairseq_train. In the fairseq_train function you can see that the fairseq parameters for the fairseq-train command are hardcoded in the function. A quick hack would be to hardcode a new cli parameter to load the pretrain model: it's --restore-file in the last fairseq version but it might be different for the fairseq version needed for ACCESS (check the output of fairseq-train --help).

You would also need to create a new dataset that you provide as input to fairseq_train_and_evaluate in ressources/datasets/, you can look at resources/datasets/wikilarge for the format.

cece95 commented 4 years ago

Thank you very much! I'll have a try immediately

cece95 commented 4 years ago

Hi, i have the dataset in the correct format and i managed to setup the restore of the model but now i'm getting this error

`Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/fairseq/utils.py", line 76, in load_model_state model.load_state_dict(state['model'], strict=True) File "/usr/local/lib/python3.7/site-packages/fairseq/models/fairseq_model.py", line 66, in load_state_dict super().load_state_dict(state_dict, strict) File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 845, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for TransformerModel: size mismatch for encoder.embed_tokens.weight: copying a param with shape torch.Size([10144, 512]) from checkpoint, the shape in current model is torch.Size([1296, 512]). size mismatch for decoder.embed_out: copying a param with shape torch.Size([10048, 512]) from checkpoint, the shape in current model is torch.Size([1168, 512]). size mismatch for decoder.embed_tokens.weight: copying a param with shape torch.Size([10048, 512]) from checkpoint, the shape in current model is torch.Size([1168, 512]).

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "scripts/train.py", line 50, in fairseq_train_and_evaluate(kwargs) File "/home/cece/Desktop/access_to_run/access/utils/training.py", line 18, in wrapped_func return func(*args, *kwargs) File "/home/cece/Desktop/access_to_run/access/utils/training.py", line 29, in wrapped_func return func(args, kwargs) File "/home/cece/Desktop/access_to_run/access/utils/training.py", line 38, in wrapped_func result = func(*args, *kwargs) File "/home/cece/Desktop/access_to_run/access/utils/training.py", line 50, in wrapped_func result = func(args, kwargs) File "/home/cece/Desktop/access_to_run/access/fairseq/main.py", line 121, in fairseq_train_and_evaluate fairseq_train(preprocessed_dir, exp_dir=exp_dir, train_kwargs) File "/home/cece/Desktop/access_to_run/access/fairseq/base.py", line 177, in fairseq_train train.main(train_args) File "/usr/local/lib/python3.7/site-packages/fairseq_cli/train.py", line 93, in main if not load_checkpoint(args, trainer, epoch_itr): File "/usr/local/lib/python3.7/site-packages/fairseq_cli/train.py", line 458, in load_checkpoint eval(args.optimizer_overrides)) File "/usr/local/lib/python3.7/site-packages/fairseq/trainer.py", line 130, in load_checkpoint filename, self.get_model(), File "/usr/local/lib/python3.7/site-packages/fairseq/utils.py", line 78, in load_model_state raise Exception('Cannot load model parameters from checkpoint, ' Exception: Cannot load model parameters from checkpoint, please ensure that the architectures match ` I understood that it's a problem with one of the dimensions of the embedding layer but i can't figure out where to modify to make them match, it seems to me that the size is automatically extracted from the dataset. Do you know where can i set it to not do that?

Thanks in advance Cece

louismartin commented 4 years ago

Hi again,

That's a good point, indeed you would need to provide the dictionary that was used for the original model to preprocess the data. First you need to locate the dict.complex.txt and dict.simple.txt files that were used with the pretrained model (downloadable here).

Then you will need to provide these paths as arguments to fairseq-preprocess, by hardcoding the arguments in the fairseq_preprocess function. These arguments should be something like --srcdict for the complex side and --tgtdict for the simple side (see fairseq-preprocess --help).

Please tell me if that works.

Best, Louis

cece95 commented 4 years ago

Hi thanks for your quick reply, now the dictionaries are loaded correctly but two things happened:

  1. The options --raw-text and --validations-before-sari-early-stopping in the training are not recognised anymore, so for now i just commented them out
  2. i get a new error raceback (most recent call last): File "scripts/train.py", line 50, in <module> fairseq_train_and_evaluate(**kwargs) File "/home/cece/Desktop/access_to_run/access/utils/training.py", line 18, in wrapped_func return func(*args, **kwargs) File "/home/cece/Desktop/access_to_run/access/utils/training.py", line 29, in wrapped_func return func(*args, **kwargs) File "/home/cece/Desktop/access_to_run/access/utils/training.py", line 38, in wrapped_func result = func(*args, **kwargs) File "/home/cece/Desktop/access_to_run/access/utils/training.py", line 50, in wrapped_func result = func(*args, **kwargs) File "/home/cece/Desktop/access_to_run/access/fairseq/main.py", line 121, in fairseq_train_and_evaluate fairseq_train(preprocessed_dir, exp_dir=exp_dir, **train_kwargs) File "/home/cece/Desktop/access_to_run/access/fairseq/base.py", line 181, in fairseq_train train.main(train_args) File "/home/cece/.local/lib/python3.7/site-packages/fairseq_cli/train.py", line 111, in main extra_state, epoch_itr = checkpoint_utils.load_checkpoint(args, trainer) File "/home/cece/.local/lib/python3.7/site-packages/fairseq/checkpoint_utils.py", line 137, in load_checkpoint reset_meters=args.reset_meters, File "/home/cece/.local/lib/python3.7/site-packages/fairseq/trainer.py", line 282, in load_checkpoint self.optimizer.load_state_dict(last_optim_state, optimizer_overrides) File "/home/cece/.local/lib/python3.7/site-packages/fairseq/optim/fairseq_optimizer.py", line 72, in load_state_dict self.optimizer.load_state_dict(state_dict) File "/home/cece/.local/lib/python3.7/site-packages/torch/optim/optimizer.py", line 116, in load_state_dict raise ValueError("loaded state dict contains a parameter group " ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group

And in this case i can't really wrap my head around it, do you have any idea?

louismartin commented 4 years ago
  1. You need to install a specific fork of fairseq for training models with ACCESS as per the readme pip install --force-reinstall fairseq@git+https://github.com/louismartin/fairseq.git@controllable-sentence-simplification

  2. This might be solved by installing the specific version of 1. as well.

cece95 commented 4 years ago

Thank you, it ended up working perfectly, the only note is that i had to downgrade to sacrebeu 1.4.4 because all newer version gave me errors such as the tokenizer 13a that you already had an issue for and 1.4.5 required the IPA dictionary

Thank you very much for your help :)

Best Cesare