ThilinaRajapakse / pytorch-transformers-classification

Based on the Pytorch-Transformers library by HuggingFace. To be used as a starting point for employing Transformer models in text classification tasks. Contains code to easily train BERT, XLNet, RoBERTa, and XLM models for text classification.
Apache License 2.0
306 stars 97 forks source link

Guidance on Model Checkpointing and saving models #13

Closed pythonometrist closed 5 years ago

pythonometrist commented 5 years ago

I am new to pytorch and it seems to have simple model save and reload functions. On the other hand pytroch_transformer has this model.save_pretrained_model().

I am looking to ensemble a few models so I need to be able to load the last version of the model and load them. You do this in the eval function - but is there a simpler way that is more pytorch like? Also I do not need checkpoints per se (so I set the flag to false), so how does the eval function figure out which .bin to go after?

ThilinaRajapakse commented 5 years ago

I'd recommend using from_pretrained() and save_pretrained() provided by pytorch_transformers. The difference is that these can be used to save and load all three components of a transformer model (model, tokenizer, config).

You can load the model simply by specifying the path. model = model_class.from_pretrained(path_to_model) Here, model_class would be something like BertForSequenceClassification. The path is the path of the directory where the pytorch_model.bin file (which contains the weights) can be found.

You only need two lines to save any model.

model_to_save.save_pretrained(path_to_save)
tokenizer.save_pretrained(path_to_save)

Then, you can load the model from the same directory.

model = model_class.from_pretrained(path_to_save)
tokenizer = tokenizer_class.from_pretrained(path_to_save)

My recommendation would be to keep this approach and save the different versions of your model in different directories. Typically, I'll store different versions in subdirectories inside the output directory. An added bonus of this approach is that if I need to perform a given task using all versions of a model, all I need is a for loop.

for model_dir in os.listdir('outputs/'):
    model = model_class.from_pretrained(outputs + model_dir)

    # Do something with current model

When using checkpoints, the training loop will save the checkpoints to args["output_dir"]/checkpoint-{step} and the final model directly to args["output_dir"]. So, if eval_all_checkpoints is set to False (or there are no checkpoints), evaluation will only be done on the model in the args["output_dir"] directory.

pythonometrist commented 5 years ago

Nice - let me implement and revert. I assume I also need config? And to freeze the individual models - I just set them to eval?

pythonometrist commented 5 years ago

Works -Your suggestion helped me think clearly on whether I need all the models online or only the predictions - it's only the latter. So ensembling becomes trivial once I have the predictions I need. N need to wory about merging models. Thanks!