HHousen / TransformerSum

Models to perform neural summarization (extractive and abstractive) using machine learning transformers and a tool to convert abstractive summarization datasets to the extractive task.
https://transformersum.rtfd.io
GNU General Public License v3.0
429 stars 58 forks source link

how to load and infer with trained model? #37

Closed moyid closed 4 years ago

moyid commented 4 years ago

I have trained a extractive longformer model, and have 3 epoch checkpoints saved -- now I want to try making inferences with it, but I'm getting errors when I run the main.py script with do_test:

python src/main.py --model_name_or_path /home/jupyter/TransformerSum/trained_models/epoch=2.ckpt --model_type longformer --data_path ./datasets/cnn_dm_extractive_compressed_5000/ --weights_save_path ./trained_models --do_test --no_use_token_type_ids --max_seq_length 2048 --batch_size 4 --log WARNING --use_logger tensorboard --weights_save_path /home/jupyter/TransformerSum/trained_models --use_custom_checkpoint_callback

output:


  File "src/main.py", line 397, in <module>
    main(main_args)
  File "src/main.py", line 56, in main
    model = summarizer(hparams=args)
  File "/home/jupyter/TransformerSum/src/extractive.py", line 112, in __init__
    gradient_checkpointing=hparams.gradient_checkpointing,
  File "/home/jupyter/TransformerSum/env/lib/python3.7/site-packages/transformers/configuration_auto.py", line 203, in from_pretrained
    config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/jupyter/TransformerSum/env/lib/python3.7/site-packages/transformers/configuration_utils.py", line 243, in get_config_dict
    config_dict = cls._dict_from_json_file(resolved_config_file)
  File "/home/jupyter/TransformerSum/env/lib/python3.7/site-packages/transformers/configuration_utils.py", line 325, in _dict_from_json_file
    text = reader.read()
  File "/opt/conda/lib/python3.7/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte```
HHousen commented 4 years ago

@moyid If you want to load a model and run the testing stage with --do_test then you need to use the --load_from_checkpoint option with the path to the model (--load_from_checkpoint /home/jupyter/TransformerSum/trained_models/epoch=2.ckpt) and the --model_name_or_path set to the huggingface/transformers model shortcode used during training, such as allenai/longformer-base-4096.

moyid commented 4 years ago

thanks. I'm getting further, but now I'm getting another error about data_type, so I tried adding that as a parameter:

python src/main.py --load_from_checkpoint /home/jupyter/TransformerSum/trained_models/epoch=2.ckpt --model_name_or_path allenai/longformer-base-4096 --model_type longformer --data_path ./datasets/cnn_dm_extractive_compressed_5000/ --data_type pt --weights_save_path ./trained_models --do_test --no_use_token_type_ids --max_seq_length 2048 --batch_size 4 --log WARNING --use_logger tensorboard --weights_save_path /home/jupyter/TransformerSum/trained_models


  File "src/main.py", line 397, in <module>
    main(main_args)
  File "src/main.py", line 95, in main
    trainer.test(model)
  File "/home/jupyter/TransformerSum/env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 710, in test
    results = self.__test_given_model(model, test_dataloaders)
  File "/home/jupyter/TransformerSum/env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 775, in __test_given_model
    results = self.fit(model)
  File "/home/jupyter/TransformerSum/env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 414, in fit
    self.data_connector.prepare_data(model)
  File "/home/jupyter/TransformerSum/env/lib/python3.7/site-packages/pytorch_lightning/trainer/connectors/data_connector.py", line 60, in prepare_data
    model.prepare_data()
  File "/home/jupyter/TransformerSum/src/extractive.py", line 513, in prepare_data
    inferred_data_type = get_inferred_data_type(dataset_files)
  File "/home/jupyter/TransformerSum/src/extractive.py", line 477, in get_inferred_data_type
    most_common != self.hparams.data_type
  File "/home/jupyter/TransformerSum/env/lib/python3.7/site-packages/pytorch_lightning/utilities/parsing.py", line 162, in __getattr__
    raise AttributeError(f'Missing attribute "{key}"') from exp
AttributeError: Missing attribute "data_type"```
HHousen commented 4 years ago

@moyid I implemented a fix that should make this work properly. You trained a model using a version of TransformerSum that only supported one dataset type and are now testing it using a version that supports both dataset types. Commit b36ba1c should make this backwards compatible. However, if b36ba1c does not fix it you can try using --load_weights instead of --load_from_checkpoint. --load_from_checkpoint will load hyperparameters from the saved model file while --load_weights will initialize them from the command line arguments.