Closed liangzimei closed 5 years ago
Thanks for your support!
Actually, I proposed MORAN to address small range deformation of text. As your irregular text is very long, I am afraid that the text in a semicircle is too difficult to rectify. You may need a curve text detector.
The output size of the rectification network of MORAN is not fixed (different from ASTER, which fix the number of the points). Theoretically, the rectification network of MORAN is able to be trained with fix input, and generalize well on the text with variable length.
For long text, a CRNN-based recognizer trained using CTC loss usually performs better. (It is reported by several papers that attention mechanism performs well only on short text.)
Thanks for your reply. i will then try a curve text detector and use its outline to rectify.
@Canjie-Luo hello, sorry to bother you, can you give some links about the papers saying that attention mechanism performs well only on short text? thanks……
Theoretically, the rectification network of MORAN is able to be trained with fix input, and generalize well on the text with variable length.
thanks for your great work and your sharing. I try to utilize your code to recognize image with variable width. Such as this picture with size 32*487:
And I modify the demo.py (with your pretrained model) in the following aspect:
if torch.cuda.is_available(): cuda_flag = True MORAN = MORAN(1, len(alphabet.split(':')), 256, 32, 800, BidirDecoder=True, CUDA=cuda_flag) MORAN = MORAN.cuda() else: MORAN = MORAN(1, len(alphabet.split(':')), 256, 32, 800, BidirDecoder=True, inputDataType='torch.FloatTensor', CUDA=cuda_flag)
image = Image.open(img_path).convert('L') scale = image.size[1] * 1.0 / 32 w = int(image.size[0] / scale)
converter = utils.strLabelConverterForAttention(alphabet, ':') transformer = dataset.resizeNormalize((w, 32)) image = transformer(image)
if cuda_flag: image = image.cuda() image = image.view(1, image.size()) image = Variable(image) text = torch.LongTensor(1 50) length = torch.IntTensor(1 * 5) text = Variable(text) length = Variable(length)
max_iter = 100 t, l = converter.encode('0'*max_iter) utils.loadData(text, t) utils.loadData(length, l) output = MORAN(image, length, text, text, test=True, debug=True)
However, I still got the worse output such as this:
Left to Right: ronaltherlyth
Could you tell me where I am wrong?
Can you give the rectified image?
Thanks for your great work , it does work. However, in practice, to handle very long text (train set has no such long samples) in inference phase, we often train a model keeping the ratio and padding rather than fixed input(e.g. 32*100).
When there is no rectified module, it works successfully. when adding a rectified module, keeping ratio and padding is difficult. Is there any rectified methods processing arbitrary length samples without fixed input(e.g. 32x100) in a batch? Thanks in advance……