eole-nlp / eole

Open language modeling toolkit based on PyTorch
https://eole-nlp.github.io/eole
MIT License
62 stars 12 forks source link

Missing key in safetensors checkpoint: generator.weight #117

Closed HURIMOZ closed 1 month ago

HURIMOZ commented 1 month ago

Hi, Iʻm using recipe wmt17 to build a bilingual translation model. Iʻm now able to train the models but on inference I get this error:

The rest of the inference seems to run fine:

[2024-09-25 06:08:27,620 INFO] PRED SCORE: -0.1463, PRED PPL: 1.16 NB SENTENCES: 256
[2024-09-25 06:08:27,620 INFO] ESTIM SCORE: 1.0000, ESTIM PPL: 0.37 NB SENTENCES: 256
Time w/o python interpreter load/terminate:  5.920188903808594

In my models repository, three files are generated for every step saved:

This is my bash command for inference: eole predict --src processed_data/test.src.bpe --model_path models/step_7000 --beam_size 5 --batch_size 2048 --batch_type tokens --output translations/test.trg.bpe --gpu 0 What am I doing wrong to get error "Missing key in safetensors checkpoint: generator.weight"?

vince62s commented 1 month ago

This is not an error. The yaml config you are using is share_decoder_embeddings: true hence the model does not store twice the weights. We'll make this clearer and avoid the message in the log when this flag is ON.

HURIMOZ commented 1 month ago

Oh I see. Yes, makes sense now. Switched to false. Thanks Vince!

vince62s commented 1 month ago

don't switch to false it's very fine to share embeddings between 1) src and tgt and 2) decoder and generator. not sharing does not bring improvement