fh2019ustc / DocTr

The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.
Other
355 stars 49 forks source link

training code #14

Open an1018 opened 2 years ago

an1018 commented 2 years ago

Hi,thanks for your great work, and when will you release the training code?

fh2019ustc commented 2 years ago

Hi, thanks for your attention to our work. We will release the training code after the acceptance of our work DocScanner.

an1018 commented 2 years ago

Thanks for your reply, could you tell us your training environment(such as, the number and model of GPU)、the training time of geometric unwarping transformer and illumination correction transformer

fh2019ustc commented 2 years ago

For geometric unwarping, we use 4 GPUs for training. The training takes about 3 days. For illumination correction, we use 2 GPUs for training. The training takes about 1 day. In fact, we do not conduct hyper-parameter tuning experiments on the batch size, learning rate, and number of GPU.

an1018 commented 2 years ago

Thanks for your detailed explanation, and the training GPUs of DocScanner is NVIDIA RTX 2080 Ti GPUs and NVIDIA GTX 1080 Ti GPU, which one is used in DocTr?

fh2019ustc commented 2 years ago

Hi, for DocTr we use 1080 Ti GPUs. In fact, based on our experience, the category of GPU seems not to affect the performance of our method.

an1018 commented 2 years ago

When writing the training code,I have some confusion. 1)Before training the GeoTr module, the background needs to be removed. Is it handled by the pre-trained model of the Segmentation module?
image

2)And after removing the background,the result looks like the image on the right? image

3)But in DocScanner, is ground truth mask the result of document localization module? If yes, Why does it say groud truth?

image

fh2019ustc commented 2 years ago

Thanks for your attention to our work.

  1. To train the segmentation module, we remove the noisy backgrounds using the GT masks rather than the pre-trained segmentation module. This is the same for our DocTr and DocScanner.
  2. You can also upsample the mask to the original resolution as the input image and then multiply them at the original resolution.
  3. See the A1.

Hope this helps.

an1018 commented 2 years ago

Is there any reference code? And what does GT masks represent in the doc3d dataset? image

fh2019ustc commented 2 years ago

In fact, it is easy to extract the GT mask of the document image from other annotations. For example, in UV map, the values of the background region are 0.

an1018 commented 2 years ago

@fh2019ustc I've written the training code, but the model does not converge. I'vd send the code to your email(haof@mail.ustc.edu.cn), could you look at the code?Thanks very much.

Aiden0609 commented 1 year ago

@an1018 So, have you reproduced it successfully with your own training code?

minhduc01168 commented 7 months ago

@an1018 So have you successfully written your own training code?