Closed mariembenslama closed 5 years ago
Hi, i am sorry to reply so late.
I fix the bug of this project these days, so i didn't check the github. You can pull the latest code to test~Thank you~
Hi mariem, you can pull the latest code. In the origin code, the val iter=max(100, len(val_dataloader))
so, it's always 100. And this can't always show the performance on the whole dataset.
Hello, thanks for the answer,
Can you explain this detail more please?
Just i guess the Test loss: 0.000928, accuray: 0.9207813
is not valed on the whole val dataset, but on 100 iters~ So maybe the value is not so accuracy~
Ohhh, I see; yes I have always seen that and didn't talk about it lol
However emmm, I'm always wondering even after changing that, what makes the model accurate? I mean what will always boost the accuracy of the real life images? Is it the data?? The no rotation of the real life images? I want to know the strong feature that should be there to make all images accurate 😅
Ohhhh, because you get 92% acc but the model do bad in real image. So, we can change this to see what's the real performance on whole val data. As for boosting the accuracy, if there is no rotation in real image, then the rotation in training data is useless(i think so). Anyway, more data is always the best way~~~
Also do we have to complete all the epochs or only stop if we get good accuracy (now with the new code)?
Also should I re-train again from my checkpoints? (the one that has 92%) or re-start the training again?
Also, All the epochs number is 1000, it's toooooooo big. Just stop when the val loss does's decrease and the val acc doesn't increase. Also, We can just re-train again from the checkpoints. The model structure did't change.
Alright! thank you very much! :)
Look, I just retried again and the first 100 samples got this: Test loss: 0.000512, accuray: 0.907031
Did you get the latest code and val on the whole dataset~
Yes i did, also I changed it to my alphabets file and all, the train root is the 10M and the val is the 1M. The 2nd 200 got: Test loss: 0.000465, accuray: 0.921875
If the real image is just the same as val image, why there is so much difference?! Now, just wait for the val acc to get 99+ and then test the result.
You see, the image (the training images are in .jpg form) while the real life are in .png = I just noticed it, would it affect the result?
Haha, it shouldn't affect the result. Did your real life and training image are the same? I always think they are not the same or the model shouldn't do so bad on real life image.
They are as you saw them (sent you before), emmm I guess I shall re-read the code probably something is wrong in my regard.
And I'll contact you again, thanks, but just wait xD
I retried the work again and it's stuck in the first 1000 it doesn't show the val test samples (??) is it because my pc is slow?
No! It's because it will val the model on whole val data every 1000 iter.
You mean every 1000 train samples passed and learned in the model, it will val the whole val data? But will that change something wether it'll test the whole val data or the small samples?
if __name__ == "__main__":
for epoch in range(params.nepoch):
train_iter = iter(train_loader)
i = 0
while i < len(train_loader):
cost = train(crnn, criterion, optimizer, train_iter)
loss_avg.add(cost)
i += 1
if i % params.displayInterval == 0:
print('[%d/%d][%d/%d] Loss: %f' %
(epoch, params.nepoch, i, len(train_loader), loss_avg.val()))
loss_avg.reset()
if i % params.valInterval == 0:
val(crnn, criterion)
# do checkpointing
if i % params.saveInterval == 0:
torch.save(crnn.state_dict(), '{0}/netCRNN_{1}_{2}.pth'.format(params.expr_dir, epoch, i))
The params.valInterval = 1000
The whole val data can represent the real data better. And the acc on the whole val data can replect the model performance better.
Hello, long time no see :D
I wanna ask, (probably I did ask the same question before but I forgot the answer sorry ^^" lol ),
When I train the model (on about +7000 Japanese, english characters - 10M train samples and 1M test samples).
The accuracy gets high (about 50% while still in epoch 0 - let's say it has entered 5k images per pre-epoch), the loss is low 0.03 and still decreases though - However when giving it a real life image case (the same as the test sample) it makes grave guessings (lol).
What do u think is the problem? Should I kill the process I mean? or wait for the epochs to finish?