ThilinaRajapakse / simpletransformers

Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conversational AI
https://simpletransformers.ai/
Apache License 2.0
4.08k stars 727 forks source link

Loading a fine tuned Seq2Seq MarianMT model gives wrong predictions #1571

Closed ziweizh24 closed 4 months ago

ziweizh24 commented 5 months ago

i initialized and trained the following model:

model = Seq2SeqModel(
    encoder_decoder_type="marian",
    encoder_decoder_name="Helsinki-NLP/opus-mt-en-mul",
    args=model_args,
    use_cuda=True,
)

After training, model.predict(['this is a test']) gives me desired output. However, when I loaded back this model to make prediction. The output is off:

from transformers import MarianMTModel
my_model = MarianMTModel.from_pretrained('outputs/best_model')

translated = my_model.generate(**tokenizer(['this is a test'], return_tensors="pt", padding=True))
[tokenizer.decode(t, skip_special_tokens=True) for t in translated]

Anything i missed?

ThilinaRajapakse commented 5 months ago

Do you get any warnings when you reload the model? (Set up logging if you haven't: logging.basicConfig(level=logging.INFO))

Does it work as expected if you reload the model with Simple Transformers and use model.predict()?

ziweizh24 commented 5 months ago

Do you get any warnings when you reload the model? (Set up logging if you haven't: logging.basicConfig(level=logging.INFO))

Does it work as expected if you reload the model with Simple Transformers and use model.predict()?

I did get the warning saying that not all weights are initialized when loading the model using MarianMTModel.from_pretrained('outputs/best_model'). Could you say a bit more about how to reload the model (PATH='outputs/best_model/') with Simple Transformer (I assume it will be using Seq2SeqModel)? Is Seq2SeqModel.from_pretrained(<PATH>) supported?

ThilinaRajapakse commented 5 months ago

To load with ST, you'd do:

model = Seq2SeqModel(
    encoder_decoder_type="marian",
    encoder_decoder_name="<PATH>",
    args=model_args,
    use_cuda=True,
)

In theory, Seq2SeqModel.from_pretrained(<PATH>) is also supported since ST uses a Huggingface model under the hood. I don't remember this exactly, but maybe Marian encoder-decoder models are a special case where this doesn't work (due to how the encoder and the decoder are set up).