The large data has ~800 tokens apparently not existing in the small data, making the vocab size different (5976 compared to 5192) - These tokens are usually single Chinese characters
Error:
RuntimeError: Error(s) in loading state_dict for SockeyeModel:
size mismatch for embedding_target.embedding.weight: copying a param with shape torch.Size([5976, 512]) from checkpoint, the shape in current model is torch.Size([5192, 512]).
size mismatch for output_layer.weight: copying a param with shape torch.Size([5976, 512]) from checkpoint, the shape in current model is torch.Size([5192, 512]).
size mismatch for output_layer.bias: copying a param with shape torch.Size([5976]) from checkpoint, the shape in current model is torch.Size([5192]).
Is there a way for me to overcome this difference? Keep the embedding layer have 5976 embeddings, and train only the 5192 that are in my data?
My training procedure is:
My fine tuning procedure is:
--params
of the previous modelThe large data has ~800 tokens apparently not existing in the small data, making the vocab size different (5976 compared to 5192) - These tokens are usually single Chinese characters
Error:
Is there a way for me to overcome this difference? Keep the embedding layer have 5976 embeddings, and train only the 5192 that are in my data?