GXYM / TextBPN-Plus-Plus

Arbitrary Shape Text Detection via Boundary Transformer;The paper at: https://arxiv.org/abs/2205.05320, which has been accepted by IEEE Transactions on Multimedia (T-MM 2023).
172 stars 37 forks source link

about missing single digit and sign #35

Closed tairen99 closed 8 months ago

tairen99 commented 8 months ago

Thank you for your wonderful work!

I am trying to fine-tune your model to detect document text, but I notice there are some single digit and single sign, such as "-", are missing. Do you know what would be the possible reason? or any place you have some preprocessing or postprocessing to remove small contour of texts?

I tried to reduce the "dis_threshold" but it does not help. Thank you in advance!

image image

GXYM commented 8 months ago

Thank you for your wonderful work!

I am trying to fine-tune your model to detect document text, but I notice there are some single digit and single sign, such as "-", are missing. Do you know what would be the possible reason? or any place you have some preprocessing or postprocessing to remove small contour of texts?

I tried to reduce the "dis_threshold" but it does not help. Thank you in advance!

image image

When creating training data, small text is also removed by default. https://github.com/GXYM/TextBPN-Plus-Plus/blob/277029349de264099f9cf13f218ed2e6a893be36/dataset/dataload.py#L178

tairen99 commented 8 months ago

Thank you for your wonderful work! I am trying to fine-tune your model to detect document text, but I notice there are some single digit and single sign, such as "-", are missing. Do you know what would be the possible reason? or any place you have some preprocessing or postprocessing to remove small contour of texts? I tried to reduce the "dis_threshold" but it does not help. Thank you in advance! image image

When creating training data, small text is also removed by default.

https://github.com/GXYM/TextBPN-Plus-Plus/blob/277029349de264099f9cf13f218ed2e6a893be36/dataset/dataload.py#L178

Thanks a lot for your quick reply!

Thank you for your wonderful work! I am trying to fine-tune your model to detect document text, but I notice there are some single digit and single sign, such as "-", are missing. Do you know what would be the possible reason? or any place you have some preprocessing or postprocessing to remove small contour of texts? I tried to reduce the "dis_threshold" but it does not help. Thank you in advance! image image

When creating training data, small text is also removed by default.

https://github.com/GXYM/TextBPN-Plus-Plus/blob/277029349de264099f9cf13f218ed2e6a893be36/dataset/dataload.py#L178

Thank you very much for your quick reply. Let me fine-tine it again and see how it goes. Thank you for your suggestion!