Open rahulhanotDTU opened 1 year ago
Hello, as mentioned in Guide in README.EN.md, please make sure that the width and height of the input image are both multiples of 8. If not, please pad. Hope helpful.
Can we use DPM solver true for inference for better results as we are not getting good results as mentioned in your paper also can you tell me how many steps the model is trained and whose pre-trained model weighs are provided in the Git repo
Our DocDiff model was trained with 100 time steps. Using DPM solver does not improve the results. Currently, we have not uploaded the jump-step sampling based on DDIM (but you can modify it based on the paper; just save x_0 for each step in the sampling process). Hence, please use 100 steps for inference, otherwise it may result in strange outputs due to the incompatibility between noise intensity and T. As for not getting the expected results as described in the paper, we suspect that it might be caused by input data. We will upload some demo images and inference demo notebooks soon for easy result reproduction. If you still cannot obtain the desired results, you can download the deblurring dataset from http://www.fit.vutbr.cz/~ihradis/CNN-Deblur/ and simply split the images into original and ground truth at 128*128 resolution with a batchsize of 32 for training (or train on your own datasets). This only requires 12GB GPU memory and takes only one and a half days to iterate for 1 million steps on a 3090 GPU. We hope this helps.
Moreover, during training, you can track the current performance of your model with the images located in the "./Training" folder. Usually, at around 10,000 steps, the model starts producing reasonable outputs. After 100,000 steps, the performance becomes more stable, and the model converges at around 1 million steps.
I am using 100 steps for inference, can you please upload the inference notebook ASAP and also demo images
For sure
Do I need to change any thing in the config file for a better result and I am using exactly same code as mentioned in Git repo
TIMESTEPS : 100
NATIVE_RESOLUTION : 'False' # if True, test with native resolution
DPM_SOLVER : 'False' # if True, test with DPM_solver
DPM_STEP : 20 # DPM_solver step
BATCH_SIZE_VAL : 1 # test batch size
TEST_PATH_GT : '/home/mepluser1/rahul_hanot/new_project_rahul/data/gt_png' # path of ground truth
TEST_PATH_IMG : '/home/mepluser1/rahul_hanot/new_project_rahul/data/input_data' # path of input
TEST_INITIAL_PREDICTOR_WEIGHT_PATH : '/home/mepluser1/rahul_hanot/DocDiff/model_weight/init_predictor_document_deblurring.pth' # path of initial predictor
TEST_DENOISER_WEIGHT_PATH : '/home/mepluser1/rahul_hanot/DocDiff/model_weight/denoiser_document_deblurring.pth' # path of denoiser
TEST_IMG_SAVE_PATH : './results_4' # path to save results
The inference notebook has been uploaded. If useful, please give it a star. Thank you.
Do I need to change any thing in the config file for a better result and I am using exactly same code as mentioned in Git repo TIMESTEPS : 100 NATIVE_RESOLUTION : 'False' # if True, test with native resolution DPM_SOLVER : 'False' # if True, test with DPM_solver DPM_STEP : 20 # DPM_solver step BATCH_SIZE_VAL : 1 # test batch size TEST_PATH_GT : '/home/mepluser1/rahul_hanot/new_project_rahul/data/gt_png' # path of ground truth TEST_PATH_IMG : '/home/mepluser1/rahul_hanot/new_project_rahul/data/input_data' # path of input TEST_INITIAL_PREDICTOR_WEIGHT_PATH : '/home/mepluser1/rahul_hanot/DocDiff/model_weight/init_predictor_document_deblurring.pth' # path of initial predictor TEST_DENOISER_WEIGHT_PATH : '/home/mepluser1/rahul_hanot/DocDiff/model_weight/denoiser_document_deblurring.pth' # path of denoiser TEST_IMG_SAVE_PATH : './results_4' # path to save results
No need change
when I am testing on your demo image it is working fine but when I test it on my image it shows very bad results can you please help in that and tell what is wrong with the process and how should i correct it
I am certain that the issue is due to the pre-trained weights I provided being trained on the deblurring dataset (http://www.fit.vutbr.cz/~ihradis/CNN-Deblur/). This dataset contains images with mostly pure white backgrounds and black text, which has a significant difference in pixel distribution compared to the test samples you provided (which have a grayish color). Furthermore, I did not perform any color data augmentation during training, leading to the results you provided.
I suggest two solutions:
when I am testing on your demo image it is working fine but when I test it on my image it shows very bad results can you please help in that and tell what is wrong with the process and how should i correct it
You can refer to the method mentioned in the ninth page of the paper "Convolutional Neural Networks for Direct Text Deblurring" (https://www.fit.vut.cz/research/publication-file/10922/hradis15CNNdeblurring.pdf) for real photo testing.
I want to resume my training from your given checkpoint can you please tell me the learning rate and config details I am currently using below-mentioned config
IMAGE_SIZE : [304, 304] # load image size, if it's train mode, it will be randomly cropped to IMAGE_SIZE. If it's test mode, it will be resized to IMAGE_SIZE.
CHANNEL_X : 3 # input channel CHANNEL_Y : 3 # output channel TIMESTEPS : 100 # diffusion steps SCHEDULE : 'linear' # linear or cosine MODEL_CHANNELS : 32 # basic channels of Unet NUM_RESBLOCKS : 1 # number of residual blocks CHANNEL_MULT : [1,2,3,4] # channel multiplier of each layer NUM_HEADS : 1
MODE : 1 # 1 Train, 0 Test PRE_ORI : 'True' # if True, predict $x_0$, else predict $\epsilon$.
PATH_GT : '' # path of ground truth PATH_IMG : '' # path of input BATCH_SIZE : 4 # training batch size NUM_WORKERS : 2 # number of workers ITERATION_MAX : 1000000 # max training iteration LR : 0.0001 # learning rate LOSS : 'L2' # L1 or L2 EMA_EVERY : 100 # update EMA every EMA_EVERY iterations START_EMA : 2000 # start EMA after START_EMA iterations SAVE_MODEL_EVERY : 10000 # save model every SAVE_MODEL_EVERY iterations EMA: 'True' # if True, use EMA CONTINUE_TRAINING : 'True' # if True, continue training CONTINUE_TRAINING_STEPS : 10000 # continue training from CONTINUE_TRAINING_STEPS PRETRAINED_PATH_INITIAL_PREDICTOR : '/home/mepluser1/rahul_hanot/try_new_1/DocDiff/checksave/init.pth' # path of pretrained initial predictor PRETRAINED_PATH_DENOISER : '/home/mepluser1/rahul_hanot/try_new_1/DocDiff/checksave/denoiser.pth' # path of pretrained denoiser WEIGHT_SAVE_PATH : './checksave' # path to save model TRAINING_PATH : './Training' # path of training data BETA_LOSS : 50 # hyperparameter to balance the pixel loss and the diffusion loss HIGH_LOW_FREQ : 'True' # if True, training with frequency separation
I want to resume my training from your given checkpoint can you please tell me the learning rate and config details I am currently using below-mentioned config
model
IMAGE_SIZE : [304, 304] # load image size, if it's train mode, it will be randomly cropped to IMAGE_SIZE. If it's test mode, it will be resized to IMAGE_SIZE.
CHANNEL_X : 3 # input channel CHANNEL_Y : 3 # output channel TIMESTEPS : 100 # diffusion steps SCHEDULE : 'linear' # linear or cosine MODEL_CHANNELS : 32 # basic channels of Unet NUM_RESBLOCKS : 1 # number of residual blocks CHANNEL_MULT : [1,2,3,4] # channel multiplier of each layer NUM_HEADS : 1
MODE : 1 # 1 Train, 0 Test PRE_ORI : 'True' # if True, predict x0, else predict ϵ.
train
PATH_GT : '' # path of ground truth PATH_IMG : '' # path of input BATCH_SIZE : 4 # training batch size NUM_WORKERS : 2 # number of workers ITERATION_MAX : 1000000 # max training iteration LR : 0.0001 # learning rate LOSS : 'L2' # L1 or L2 EMA_EVERY : 100 # update EMA every EMA_EVERY iterations START_EMA : 2000 # start EMA after START_EMA iterations SAVE_MODEL_EVERY : 10000 # save model every SAVE_MODEL_EVERY iterations EMA: 'True' # if True, use EMA CONTINUE_TRAINING : 'True' # if True, continue training CONTINUE_TRAINING_STEPS : 10000 # continue training from CONTINUE_TRAINING_STEPS PRETRAINED_PATH_INITIAL_PREDICTOR : '/home/mepluser1/rahul_hanot/try_new_1/DocDiff/checksave/init.pth' # path of pretrained initial predictor PRETRAINED_PATH_DENOISER : '/home/mepluser1/rahul_hanot/try_new_1/DocDiff/checksave/denoiser.pth' # path of pretrained denoiser WEIGHT_SAVE_PATH : './checksave' # path to save model TRAINING_PATH : './Training' # path of training data BETA_LOSS : 50 # hyperparameter to balance the pixel loss and the diffusion loss HIGH_LOW_FREQ : 'True' # if True, training with frequency separation
The default config is suitable for document scenario. No need change. The training will be very stable.
should I change the LR from 0.0001 to 1e-5 or something lower?
should I change the LR from 0.0001 to 1e-5 or something lower?
1e-4 is ok. Lower means longer training time and performance will not be better.
I resumed the training with the above config with my custom data for the 281218 steps but I didn't get the required result so how long do I need to train the model
I resumed the training with the above config with my custom data for the 281218 steps but I didn't get the required result so how long do I need to train the model
Take a look at the output image of the training process
I trained the model for 7.5L but the results I am getting are uploaded below and the models weight is model_denoiser_450000.pth and model_init_450000.pth can you please tell why I am getting this type of result and how can improve them and your demo inference code and I am using
I trained the model for 7.5L but the results I am getting are uploaded below and the models weight is model_denoiser_450000.pth and model_init_450000.pth can you please tell why I am getting this type of result and how can improve them and your demo inference code and I am using
From the image you provided, this is not a simple deblurring task. The degraded document image you provided is not only blurry, but the strokes of the characters are also very faint. Therefore, your task is more difficult. My suggestion is to enhance the contrast of the input image to increase the intensity of the faint strokes.
I trained the model for 7.5L but the results I am getting are uploaded below and the models weight is model_denoiser_450000.pth and model_init_450000.pth can you please tell why I am getting this type of result and how can improve them and your demo inference code and I am using
It's best to train from scratch. If the results are still poor, you may need to design some additional modules to extract features.
I will try to train it from scratch but can you please help in writing a separate module or list of augmentation for these type of images so that model will predict them good
I will try to train it from scratch but can you please help in writing a separate module or list of augmentation for these type of images so that model will predict them good
I believe that it may be helpful for you to analyze the characteristics of your sample and consider designing additional modules to improve the overall outcome. This process typically involves a significant amount of trial and error in order to achieve the desired results.
I want to denoise the noise that is created by WhatsApp compression but I am unable to create that type of noise can you please help in creating this type of noise for text image
I want to denoise the noise that is created by WhatsApp compression but I am unable to create that type of noise can you please help in creating this type of noise for text image
This is salt-and-pepper noise, I might need to research how to synthesize it, but it's already on my to-do list.
I don't think so this is some kind of compression that was used by the WhatsApp to send the image can you please help urgently
I don't think so this is some kind of compression that was used by the WhatsApp to send the image can you please help urgently
Sure, I will do my best to assist you as soon as possible.
Any update regarding it
请问你上传的inference notebook中怎么没有去噪和去水印的过程?当我把TEST_PATH_IMG 的路径改成带有水印的图像时,得到的结果是这样的。我想知道应该怎么办? ![Uploading PMC1635421_00004.jpg.png…]()
This issue is coming during inference phase of this model for every image
File "/home/mepluser1/rahul_hanot/try_new/DocDiff/model/DocDiff.py", line 315, in forward x = torch.cat((x, s), dim=1) RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 76 but got size 75 for tensor number 1 in the list. I