My configuration is as follows:
batch_size=384, epoch=20, val_check_interval=500, gpu:3090,others are the default configuration,
charset_train=62_mixed-case
charset_test = string.digits + string.ascii_lowercase + string.ascii_uppercase
I have a few questions to ask you
My dataset format is strictly following your format. My data are all characters, only numbers, uppercase and lowercase English. My dataset split ratio is 8:1:1. details as follows
In your paper, I see that real data and val data sets are not divided under the same data set. Can I understand that the data set under data/val is only used for verification and does not participate in training? According to my guess, I placed my divided data set, that is, the training set, in the real directory, the divided test set in the test directory, and different data sets in the data/val directory. For example, I trained the D001 dataset and placed D004 under data/val. The final test is D001 and D004. The accuracy of D001 is high, but the accuracy of D004 is very low. I don't quite understand the role of the two vals in the data directory, can you explain it, thank you!
Another question is, can I use all your data sets plus my own data set for training, using charset_train=62_mixed-case?
But what I am worried about is that in the demo of hugging face, I used your pre-trained weights to predict my pictures and recognized punctuation marks, but there are no punctuation marks in my data set.
What should I do about it?
Does the charset used in the test have to be 32_lowercase?
My configuration is as follows: batch_size=384, epoch=20, val_check_interval=500, gpu:3090,others are the default configuration, charset_train=62_mixed-case charset_test = string.digits + string.ascii_lowercase + string.ascii_uppercase I have a few questions to ask you
In your paper, I see that real data and val data sets are not divided under the same data set. Can I understand that the data set under data/val is only used for verification and does not participate in training? According to my guess, I placed my divided data set, that is, the training set, in the real directory, the divided test set in the test directory, and different data sets in the data/val directory. For example, I trained the D001 dataset and placed D004 under data/val. The final test is D001 and D004. The accuracy of D001 is high, but the accuracy of D004 is very low. I don't quite understand the role of the two vals in the data directory, can you explain it, thank you!
What should I do about it?