Open uhSuiL opened 4 months ago
I worried if the input img should consist of only single line text, so I conducted 3 other tests, result seems not to meet the expectation: (Test below: I mask the second line) (Test below: I cut out the second line and simply resize img to (512, 128) ) (Test below: I cut out the second line and white margin in the first line and resize it to (512, 128) making it not that deformed) I guess my images is not that hard to recognize text for human.
I conducted another test: padding on left, right, top, bottom to keep the size (512, 128), leaving the text image centric and deformed. This is result:
Thank you for your interest in this work. There are a few key points that require clarification.
First, this project is currently only applicable to single-line text images, with the input size limited to patches of 128x512, and the number of text characters is no more than 24. Therefore, for other images containing text patches, you should first detect the text line image from the original image using a text detection method like PaddleOCR, then crop and resize the patches to 128x512 and input them into the DiffTSR model.
Second, the text area in the cropped image should occupy the center. Usually, the text patches detected by the text detection model meet this condition. Additionally, the DiffTSR model is robust to text deformation.
Third, the DiffTSR model focuses on scene text images. We have not fully tested its performance in other scenarios, but it can be easily adapted for other scenarios with fine-tuning.
For more details, please refer to the main manuscript and the supplementary materials. Thanks for your interest, and we are also working on developing methods that are more adaptable.
Thanks for your reply and appreciate your work.
I'm going to check your paper again and follow your proposal soon afterwards to see whether the model actually works or not in my case. Please keep this issue open, I think I could post my feedback here.
I used your model on my task, it seems no that good? I clipped the size of my img to (512, 128) following your size. The original input img is the first below followed by the result img. Is there anything wrong? Below is my code: