GXYM / TextBPN-Plus-Plus

Arbitrary Shape Text Detection via Boundary Transformer;The paper at: https://arxiv.org/abs/2205.05320, which has been accepted by IEEE Transactions on Multimedia (T-MM 2023).
175 stars 38 forks source link

Hello author, the problem that appears in your project TextPMs is reproduced. #11

Closed Zhang-Chunhu closed 1 year ago

Zhang-Chunhu commented 1 year ago

Hello Author, First of all, thank you very much for your contribution. In your project TextPMs, I can achieve the results in your paper very well with the model you provided, but when I train it by myself (e.g. CTW1500), the results are very different from those in your paper. And with pre-training (e.g. ICDAR2015), the generated txt file is blank when evaluated.

GXYM commented 1 year ago

Hello Author, First of all, thank you very much for your contribution. In your project TextPMs, I can achieve the results in your paper very well with the model you provided, but when I train it by myself (e.g. CTW1500), the results are very different from those in your paper. And with pre-training (e.g. ICDAR2015), the generated txt file is blank when evaluated.

First make sure you use pre-training as in paper. Then you can try to adjust the test parameters to see the impact on the results. If there is no effect, you need to see if the training has not converged. Our code contains enough visualization tools for you to analyze these problems. For the problem of ICDAR2015, you can visualize the output of the network. If the network output is correct, but the txt file is empty, it may be caused by post-processing(especially the version of opencv), or there is a problem writing the file. If the PSE post-processing algorithm is used, please ensure that it is compiled correctly, otherwise there is no result.

Note that we used the old version and evaluation scripts of CTW1500. If you use a new version of the dataset, please ensure that the data format is correct. You can use the dataset loading script to check whether the dataset is in the correct format by visualizing. And the label of CTW1500 has more noise, so OHEM can be used less in the training process(even not be used), and the result will be better.

If all the techniques are ineffective, please let us know. We can take some time to check our code and reproduce it on the CTW1500. But we believe that our results can be reproduced. You can also refer to our reproduction details on TextBPN++. I hope this can help you. (This project (TextPMs) is too old (three years ago). We've given up updating. )

I hope my reply can help you!

GXYM commented 1 year ago

In addition, it is not possible to directly run the given training script to reproduce the experimental results. I checked some training scripts. These training scripts only have part of the training process, which are for reference. Please refer to the training process in the paper to modify the training script, and training and testing step by step. When the training does not converge, please adjust the learning rate appropriately.