Open an1018 opened 2 years ago
Hi, thanks for your attention to our work. We will release the training code after the acceptance of our work DocScanner.
Thanks for your reply, could you tell us your training environment(such as, the number and model of GPU)、the training time of geometric unwarping transformer and illumination correction transformer
For geometric unwarping, we use 4 GPUs for training. The training takes about 3 days. For illumination correction, we use 2 GPUs for training. The training takes about 1 day. In fact, we do not conduct hyper-parameter tuning experiments on the batch size, learning rate, and number of GPU.
Thanks for your detailed explanation, and the training GPUs of DocScanner is NVIDIA RTX 2080 Ti GPUs and NVIDIA GTX 1080 Ti GPU, which one is used in DocTr?
Hi, for DocTr we use 1080 Ti GPUs. In fact, based on our experience, the category of GPU seems not to affect the performance of our method.
When writing the training code,I have some confusion.
1)Before training the GeoTr module, the background needs to be removed. Is it handled by the pre-trained model of the Segmentation module?
2)And after removing the background,the result looks like the image on the right?
3)But in DocScanner, is ground truth mask the result of document localization module? If yes, Why does it say groud truth?
Thanks for your attention to our work.
Hope this helps.
Is there any reference code? And what does GT masks represent in the doc3d dataset?
In fact, it is easy to extract the GT mask of the document image from other annotations. For example, in UV map, the values of the background region are 0.
@fh2019ustc I've written the training code, but the model does not converge. I'vd send the code to your email(haof@mail.ustc.edu.cn), could you look at the code?Thanks very much.
@an1018 So, have you reproduced it successfully with your own training code?
@an1018 So have you successfully written your own training code?
Hi,thanks for your great work, and when will you release the training code?