Fine Tuning on Different Data size mismatch for embedding_target

My training procedure is:

Learn a BPE vocab over some large data
Apply BPE on some large data
Prepare data
Train model

My fine tuning procedure is:

Apply previous BPE on some small data
Prepare data
Train model with --params of the previous model

The large data has ~800 tokens apparently not existing in the small data, making the vocab size different (5976 compared to 5192) - These tokens are usually single Chinese characters

Error:

RuntimeError: Error(s) in loading state_dict for SockeyeModel:
    size mismatch for embedding_target.embedding.weight: copying a param with shape torch.Size([5976, 512]) from checkpoint, the shape in current model is torch.Size([5192, 512]).
    size mismatch for output_layer.weight: copying a param with shape torch.Size([5976, 512]) from checkpoint, the shape in current model is torch.Size([5192, 512]).
    size mismatch for output_layer.bias: copying a param with shape torch.Size([5976]) from checkpoint, the shape in current model is torch.Size([5192]).

Is there a way for me to overcome this difference? Keep the embedding layer have 5976 embeddings, and train only the 5192 that are in my data?

awslabs / sockeye

Fine Tuning on Different Data size mismatch for embedding_target #1097

Error: