Open 2113vm opened 4 years ago
i my case, training without --data_filtering_off, the model shows 60% ACC, training with --data_filtering_off, the model shows 30% ACC ...
But I have improved my accuracy. But I have done not the same. I added '\' before every special symbol in opt.character, I didn't use --data_filtering_off flag. I don't guarantee that it's the correct way, because I could make a mistake with a specail symbol or there was incorrect behavior with num_class. But, I want to note, that before fixing the bug my model didn't predict correctly part of the alphabet. The predictions were bad even for another part of the alphabet. And accuracy was ~79%. After fixing the bug I had the accuracy ~82%, but the predictions were far better. Maybe, in your case, the model has accuracy less but has more correct predictions because the model knows more symbols than your penultimate model.
I have trained a model on my custom dataset. My dataset contains about 88k images of words and labels. And once I saw that the model was training only on 40k images. The problem was my alphabet contain the special symbols, e.g. ][?!*^. As I saw later, part of the data was skipped, when data was loading. The reason is how works --data_filtering_off. It uses re.search function with pattern
f[^{opt.character}]
. And when you use the alphabet with special symbols for a regular expression, your data can be skipped. You also can't add '\' for any special symbols because then you have more num_classes than it be.
Authors mention that --data_filtering_off
is for alphanumeric characters: check this link. And that's why your training skipped special characters
i my case, training without --data_filtering_off, the model shows 60% ACC, training with --data_filtering_off, the model shows 30% ACC ...
It's because, --data_filtering_off
filters alphanumeric characters and ignore special characters
I have trained a model on my custom dataset. My dataset contains about 88k images of words and labels. And once I saw that the model was training only on 40k images. The problem was my alphabet contain the special symbols, e.g. ][?!*^. As I saw later, part of the data was skipped, when data was loading. The reason is how works --data_filtering_off. It uses re.search function with pattern
f[^{opt.character}]
. And when you use the alphabet with special symbols for a regular expression, your data can be skipped. You also can't add '\' for any special symbols because then you have more num_classes than it be.