size mismatch - Githubissues

Hi, I found the model size mismatch with checkpoint. size mismatch for encoder.block.0.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for encoder.block.1.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for encoder.block.2.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for encoder.block.3.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for encoder.block.4.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for encoder.block.5.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for encoder.block.6.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for encoder.block.7.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for encoder.block.8.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for encoder.block.9.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for encoder.block.10.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for encoder.block.11.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for decoder.block.0.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for decoder.block.1.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for decoder.block.2.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for decoder.block.3.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for decoder.block.4.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for decoder.block.5.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for decoder.block.6.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for decoder.block.7.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for decoder.block.8.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for decoder.block.9.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for decoder.block.10.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for decoder.block.11.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]).

Hi, I found the model size mismatch with checkpoint. size mismatch for encoder.block.0.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for encoder.block.1.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for encoder.block.2.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for encoder.block.3.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for encoder.block.4.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for encoder.block.5.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for encoder.block.6.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for encoder.block.7.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for encoder.block.8.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for encoder.block.9.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for encoder.block.10.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for encoder.block.11.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for decoder.block.0.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for decoder.block.1.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for decoder.block.2.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for decoder.block.3.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for decoder.block.4.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for decoder.block.5.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for decoder.block.6.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for decoder.block.7.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for decoder.block.8.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for decoder.block.9.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for decoder.block.10.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]). size mismatch for decoder.block.11.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([768, 2048]).

You can try this model: https://huggingface.co/google-t5/t5-base.

LLLogen / VSDcode

size mismatch #6