Closed KingRicardo closed 2 years ago
The number of my downloaded doc3d dataset is 102064. Do I miss some data?
Could you please let me know how many images you are missing in your downloaded dataset? What are the mismatched image filenames look like? There might be some naming convention issues... I did some data scale based ablation studies in the supplementary and after 32K training images, the improve is becoming marginal. The total number of images looks legit to me...As I directly pulled the data from our server, there might be subtle difference (tho unlikely) from the one you downloaded from.
Thanks for your reply. I guess it's just a name convention problem. I've generated a new train list.
I still have another question about bgtex.txt. I guess you use the same data augmentation as DewarpNet (replace the background with a texture image). Could you tell me the increase in using this strategy?
I still have another question about bgtex.txt. I guess you use the same data augmentation as DewarpNet (replace the background with a texture image). Could you tell me the increase in using this strategy?
Yep the same augmentation.
I am not sure about the improvement brought by this as I have not ablated this part lol.
I was using black background synthetic training images in my docunet project and I found the trained model could not generalize to real world images with non-black background at all...so I have been using this techniques in all the following projects.
Thank you for sharing the experiment detail!
I've downloaded the doc3d dataset and diw dataset and tried to train. But I find that the train list (doc3d_trn.txt in the code) doesn't match the doc3d dataset, most of the data can't be found in the doc3d dataset.