In order to improve the accuracy of scene text recognition, some existing methods first rectify the text image(e.g. Iterative Image Rectification) and then recognize text. ABINet does not adopt the method of correction first and then recognize text, is it because the correction cannot improve the identification accuracy? What do you think of this kind of methods? Have you tried this idea?
Actually we did not try Image Rectification as we basically focus on language modeling problems. But I think this is an effective method to additionally integrate rectification if you just want to boost accuracy.
In order to improve the accuracy of scene text recognition, some existing methods first rectify the text image(e.g. Iterative Image Rectification) and then recognize text. ABINet does not adopt the method of correction first and then recognize text, is it because the correction cannot improve the identification accuracy? What do you think of this kind of methods? Have you tried this idea?