Closed Dddddebil closed 1 year ago
Sorry, I dont understand, can you explain it?
I trained AI on 9 characters captchas, during the training I was shown 99% accuracy without recognition errors, when I try to recognize a picture, I get output only from ++++++
I need you to provide me some answers:
Also, the first class of your list of classes should be the CTC token, in this case denoted by "∅" in training. When you train the model, it outputs in the terminal the list of classes, I recommend you to copy it and use in the inference.
thanks!
Did you solve the issue?
Yes, but do you know why the letters are repeated python3 inference.py dataset/761354.jpeg text: ['7', '7', ‘6’, ‘6’, ‘1’, ‘3’, ‘5’, ‘4’]
It could either be a decoding problem (although I don't think it is) or a training problem. The final predicted string goes in a decoding process to remove those duplicates, but if it still appears, it means that the model truly thinks that there is two '7' and two '6'.
To make this better you either need more training data or train a bit longer and see if that solves your problem.
The model output is actually a long list of predictions, when you use CTC the model actually predicts a class for each stripe. As normal, your numbers appear more than in one stripe, because of this, the model output have duplicates, but we insert the blank token "∅" to denote blank space.
That was maybe a confuse explanation but the point is, the model predicts duplicates for all your digits, but it also classify empty space ("∅"), so when there is this blank token between your letters/numbers, we suppose that there is really two of those numbers in sequence (say 77∅77 would be '77', but 7777 would simply be '7' because there is no blank in between).
Okey, I'll get the amount of training data from 3500 to 7000 images and train 100 circles instead of 50
Put enough epochs to see a good validation accuracy (at a certain point you won't get better acc, then you can stop), about the number of images, the higher the better.
size mismatch for linear.weight: copying a param with shape torch.Size([256, 960]) from checkpoint, the shape in current model is torch.Size([256, 832]).
Why? I haven't changed anything I've been teaching on, and that's what I'm running on.
if i changed nn.Linear to 960 RuntimeError: mat1 and mat2 shapes cannot be multiplied (133x1152 and 960x256) WTF
This linear layer is dynamically calculated based on your input size. It looks like your input shape changed, as your error says, your model saved weights is a linear layer of torch.Size([256, 960]), but your model is now dynamically calculating ([256, 832]) based on your input image.
size mismatch for linear.weight: copying a param with shape torch.Size([256, 960]) from checkpoint, the shape in current model is torch.Size([256, 832]).
Why? I haven't changed anything I've been teaching on, and that's what I'm running on.
Image size: torch.Size([1, 1, 70, 530]) text: ['+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+', '+']