ZZZHANG-jx / DocRes

[CVPR 2024] DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks
MIT License
287 stars 26 forks source link

Different Image Task Input Issues #7

Closed lxy5513 closed 2 months ago

lxy5513 commented 2 months ago

@ZZZHANG-jx I highly appreciate the your work on document restoration based on prompts, and the ideas in it are extremely beneficial. Thank for your aworsome job.

In the paper, I am a bit confused about the dewarping task. The input image for this task includes some padding outside the document, whereas for other tasks, the input is the document itself. Is this design specific to the characteristics of the dewarping task? Why is it done this way?

Why we input this(with some padding) image rather than image

ZZZHANG-jx commented 2 months ago

Thank you for your interest in our work.

The current dewarping task primarily focuses on data that includes the surrounding environmental margins (i.e., the padding you mentioned). This is consistent with the data used in major dewarping benchmarks such as DocUNet, DIR300, and WarpDoc (the visual results in our paper were derived from the DIR300 dataset). We follow these previous works and also primarily consider cases with environmental margins.

There are also some works that address more diverse environmental margins, such as Marior and DocTr++. You might find it helpful to refer to the corresponding papers for these methods.

lxy5513 commented 2 months ago

@ZZZHANG-jx I get it, Thanks for your kind reply