Closed yeahQing closed 1 year ago
Just a wild guess, could be caused by differences in how character sets are split into training and testing, i.e., how characters in character-3755.txt in #57 are sorted...
Hi, thanks for your attention on our work ;D
The results in the "results on train_500 and test_1000" setting seems wierd. The expected accuracy would be about 5.0%. Specifically, the samples for the first 500 characters in the alphabet (1-500 characters in the 3755 characters) will be chosen for training, and the samples for last 1000 characters (2755-3755 characters in the 3755 characters) for testing.
Hi, thanks for your attention on our work ;D
The results in the "results on train_500 and test_1000" setting seems wierd. The expected accuracy would be about 5.0%. Specifically, the samples for the first 500 characters in the alphabet (1-500 characters in the 3755 characters) will be chosen for training, and the samples for last 1000 characters (2755-3755 characters in the 3755 characters) for testing.
Hi, chen, thanks for your reply. I build my datasets using method #57, and charecter-3755.txt build from decompose-stroke-3755.txt. Maybe my lmdb format is error. Could you offer me hwdb lmdb dataset?
Just a wild guess, could be caused by differences in how character sets are split into training and testing, i.e., how characters in character-3755.txt in #57 are sorted...
Hi, thanks for your reply. The character-3755.txt split by alphabet order as same as decompose-stroke-3755.txt.
As shown below, it is the samples, labels and index corresponding to the test_1000 lmdb dataset I generated. @lancercat @JingyeChen
Hello, may I ask if the problem has been solved? My experiment results on hwdb data set are also poor.
Hello, may I ask if the problem has been solved? My experiment results on hwdb data set are also poor.
Hi, I ran the project by the default config, using train_500 for train and test_1000 for test respectively, the best result is around 3.7% . So I have not solve my problem actually. My config is shown below. The best result was taken in the 6th validation of the first epoch, the samples are around 1000326. I'm also confused now, TT.
config = {
'exp_name': '【0112】SLD_train500_v4',
'epoch': 500,
'lr': 1.0,
'mode': 'stroke', # character / stroke
'batch': 32,
'val_frequency': 1000,
'test_only': False,
'resume': '/root/autodl-proj/stroke-level-decomposition/history/【0112】SLD_train500_v2/best_model.pth',
'train_dataset': '/root/autodl-tmp/hwdb/lmdb_rgb/train_500',
# '/root/autodl-tmp/hwdb/lmdb_rgb/test_1000',
# '/root/autodl-tmp/hwdb/lmdb_rgb/train_1000,'
# '/root/autodl-tmp/hwdb/lmdb_rgb/train_1500,'
# '/root/autodl-tmp/hwdb/lmdb_rgb/train_2000,'
# '/root/autodl-tmp/hwdb/lmdb_rgb/train_2755',
'test_dataset': '/root/autodl-tmp/hwdb/lmdb_rgb/test_1000',
'weight_decay': False,
'schedule_frequency': 1000000,
'image_size': 32,
'alphabet': 3755,
}
Hi, everyone! First of all, I'm very excited that you guys can share your works. I ran your code in default config, only the datasets are generated by myself. Unfortunately, I am getting bad results on train_500 and test_1000, the results are shown below. I don't know why would cause this problem, the accuracy is only have around 0.5% after 50 epoch.
Tips: epoch is a display error.