cvlab-stonybrook / PaperEdge

The code and the DIW dataset for "Learning From Documents in the Wild to Improve Document Unwarping" (SIGGRAPH 2022)
MIT License
121 stars 23 forks source link

Question about train list in the code #5

Closed KingRicardo closed 2 years ago

KingRicardo commented 2 years ago

I've downloaded the doc3d dataset and diw dataset and tried to train. But I find that the train list (doc3d_trn.txt in the code) doesn't match the doc3d dataset, most of the data can't be found in the doc3d dataset.

KingRicardo commented 2 years ago

The number of my downloaded doc3d dataset is 102064. Do I miss some data?

wkema commented 2 years ago

Could you please let me know how many images you are missing in your downloaded dataset? What are the mismatched image filenames look like? There might be some naming convention issues... I did some data scale based ablation studies in the supplementary and after 32K training images, the improve is becoming marginal. The total number of images looks legit to me...As I directly pulled the data from our server, there might be subtle difference (tho unlikely) from the one you downloaded from.

KingRicardo commented 2 years ago

Thanks for your reply. I guess it's just a name convention problem. I've generated a new train list.

KingRicardo commented 2 years ago

I still have another question about bgtex.txt. I guess you use the same data augmentation as DewarpNet (replace the background with a texture image). Could you tell me the increase in using this strategy?

wkema commented 2 years ago

I still have another question about bgtex.txt. I guess you use the same data augmentation as DewarpNet (replace the background with a texture image). Could you tell me the increase in using this strategy?

Yep the same augmentation.

I am not sure about the improvement brought by this as I have not ablated this part lol.

I was using black background synthetic training images in my docunet project and I found the trained model could not generalize to real world images with non-black background at all...so I have been using this techniques in all the following projects.

KingRicardo commented 2 years ago

Thank you for sharing the experiment detail!