Royalvice / DocDiff

ACM Multimedia 2023: DocDiff: Document Enhancement via Residual Diffusion Models. Also contains 1597 red seals in Chinese scenes, along with their corresponding binary masks.
https://www.aibupt.com/
MIT License
234 stars 24 forks source link

Inference Issue #4

Open rahulhanotDTU opened 1 year ago

rahulhanotDTU commented 1 year ago

This issue is coming during inference phase of this model for every image

File "/home/mepluser1/rahul_hanot/try_new/DocDiff/model/DocDiff.py", line 315, in forward x = torch.cat((x, s), dim=1) RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 76 but got size 75 for tensor number 1 in the list. I

Royalvice commented 1 year ago

Hello, as mentioned in Guide in README.EN.md, please make sure that the width and height of the input image are both multiples of 8. If not, please pad. Hope helpful.

rahulhanotDTU commented 1 year ago

Can we use DPM solver true for inference for better results as we are not getting good results as mentioned in your paper also can you tell me how many steps the model is trained and whose pre-trained model weighs are provided in the Git repo

Royalvice commented 1 year ago

Our DocDiff model was trained with 100 time steps. Using DPM solver does not improve the results. Currently, we have not uploaded the jump-step sampling based on DDIM (but you can modify it based on the paper; just save x_0 for each step in the sampling process). Hence, please use 100 steps for inference, otherwise it may result in strange outputs due to the incompatibility between noise intensity and T. As for not getting the expected results as described in the paper, we suspect that it might be caused by input data. We will upload some demo images and inference demo notebooks soon for easy result reproduction. If you still cannot obtain the desired results, you can download the deblurring dataset from http://www.fit.vutbr.cz/~ihradis/CNN-Deblur/ and simply split the images into original and ground truth at 128*128 resolution with a batchsize of 32 for training (or train on your own datasets). This only requires 12GB GPU memory and takes only one and a half days to iterate for 1 million steps on a 3090 GPU. We hope this helps.

Royalvice commented 1 year ago

Moreover, during training, you can track the current performance of your model with the images located in the "./Training" folder. Usually, at around 10,000 steps, the model starts producing reasonable outputs. After 100,000 steps, the performance becomes more stable, and the model converges at around 1 million steps.

rahulhanotDTU commented 1 year ago

I am using 100 steps for inference, can you please upload the inference notebook ASAP and also demo images

Royalvice commented 1 year ago

For sure

rahulhanotDTU commented 1 year ago

Do I need to change any thing in the config file for a better result and I am using exactly same code as mentioned in Git repo TIMESTEPS : 100
NATIVE_RESOLUTION : 'False' # if True, test with native resolution DPM_SOLVER : 'False' # if True, test with DPM_solver DPM_STEP : 20 # DPM_solver step BATCH_SIZE_VAL : 1 # test batch size TEST_PATH_GT : '/home/mepluser1/rahul_hanot/new_project_rahul/data/gt_png' # path of ground truth TEST_PATH_IMG : '/home/mepluser1/rahul_hanot/new_project_rahul/data/input_data' # path of input TEST_INITIAL_PREDICTOR_WEIGHT_PATH : '/home/mepluser1/rahul_hanot/DocDiff/model_weight/init_predictor_document_deblurring.pth' # path of initial predictor TEST_DENOISER_WEIGHT_PATH : '/home/mepluser1/rahul_hanot/DocDiff/model_weight/denoiser_document_deblurring.pth' # path of denoiser TEST_IMG_SAVE_PATH : './results_4' # path to save results

Royalvice commented 1 year ago

The inference notebook has been uploaded. If useful, please give it a star. Thank you.

Royalvice commented 1 year ago

Do I need to change any thing in the config file for a better result and I am using exactly same code as mentioned in Git repo TIMESTEPS : 100 NATIVE_RESOLUTION : 'False' # if True, test with native resolution DPM_SOLVER : 'False' # if True, test with DPM_solver DPM_STEP : 20 # DPM_solver step BATCH_SIZE_VAL : 1 # test batch size TEST_PATH_GT : '/home/mepluser1/rahul_hanot/new_project_rahul/data/gt_png' # path of ground truth TEST_PATH_IMG : '/home/mepluser1/rahul_hanot/new_project_rahul/data/input_data' # path of input TEST_INITIAL_PREDICTOR_WEIGHT_PATH : '/home/mepluser1/rahul_hanot/DocDiff/model_weight/init_predictor_document_deblurring.pth' # path of initial predictor TEST_DENOISER_WEIGHT_PATH : '/home/mepluser1/rahul_hanot/DocDiff/model_weight/denoiser_document_deblurring.pth' # path of denoiser TEST_IMG_SAVE_PATH : './results_4' # path to save results

No need change

rahulhanotDTU commented 1 year ago

when I am testing on your demo image it is working fine but when I test it on my image it shows very bad results can you please help in that and tell what is wrong with the process and how should i correct it res1_nonative

Royalvice commented 1 year ago

I am certain that the issue is due to the pre-trained weights I provided being trained on the deblurring dataset (http://www.fit.vutbr.cz/~ihradis/CNN-Deblur/). This dataset contains images with mostly pure white backgrounds and black text, which has a significant difference in pixel distribution compared to the test samples you provided (which have a grayish color). Furthermore, I did not perform any color data augmentation during training, leading to the results you provided.

I suggest two solutions:

  1. If you have a large number of similar samples (over 500), fine-tune the pre-trained weights on them.
  2. Perform color data augmentation on the deblurring dataset (http://www.fit.vutbr.cz/~ihradis/CNN-Deblur/) to change the grayscale distribution of the training samples.
Royalvice commented 1 year ago

when I am testing on your demo image it is working fine but when I test it on my image it shows very bad results can you please help in that and tell what is wrong with the process and how should i correct it res1_nonative

You can refer to the method mentioned in the ninth page of the paper "Convolutional Neural Networks for Direct Text Deblurring" (https://www.fit.vut.cz/research/publication-file/10922/hradis15CNNdeblurring.pdf) for real photo testing.

rahulhanotDTU commented 1 year ago

I want to resume my training from your given checkpoint can you please tell me the learning rate and config details I am currently using below-mentioned config

model

IMAGE_SIZE : [304, 304] # load image size, if it's train mode, it will be randomly cropped to IMAGE_SIZE. If it's test mode, it will be resized to IMAGE_SIZE.

CHANNEL_X : 3 # input channel CHANNEL_Y : 3 # output channel TIMESTEPS : 100 # diffusion steps SCHEDULE : 'linear' # linear or cosine MODEL_CHANNELS : 32 # basic channels of Unet NUM_RESBLOCKS : 1 # number of residual blocks CHANNEL_MULT : [1,2,3,4] # channel multiplier of each layer NUM_HEADS : 1

MODE : 1 # 1 Train, 0 Test PRE_ORI : 'True' # if True, predict $x_0$, else predict $\epsilon$.

train

PATH_GT : '' # path of ground truth PATH_IMG : '' # path of input BATCH_SIZE : 4 # training batch size NUM_WORKERS : 2 # number of workers ITERATION_MAX : 1000000 # max training iteration LR : 0.0001 # learning rate LOSS : 'L2' # L1 or L2 EMA_EVERY : 100 # update EMA every EMA_EVERY iterations START_EMA : 2000 # start EMA after START_EMA iterations SAVE_MODEL_EVERY : 10000 # save model every SAVE_MODEL_EVERY iterations EMA: 'True' # if True, use EMA CONTINUE_TRAINING : 'True' # if True, continue training CONTINUE_TRAINING_STEPS : 10000 # continue training from CONTINUE_TRAINING_STEPS PRETRAINED_PATH_INITIAL_PREDICTOR : '/home/mepluser1/rahul_hanot/try_new_1/DocDiff/checksave/init.pth' # path of pretrained initial predictor PRETRAINED_PATH_DENOISER : '/home/mepluser1/rahul_hanot/try_new_1/DocDiff/checksave/denoiser.pth' # path of pretrained denoiser WEIGHT_SAVE_PATH : './checksave' # path to save model TRAINING_PATH : './Training' # path of training data BETA_LOSS : 50 # hyperparameter to balance the pixel loss and the diffusion loss HIGH_LOW_FREQ : 'True' # if True, training with frequency separation

Royalvice commented 1 year ago

I want to resume my training from your given checkpoint can you please tell me the learning rate and config details I am currently using below-mentioned config

model

IMAGE_SIZE : [304, 304] # load image size, if it's train mode, it will be randomly cropped to IMAGE_SIZE. If it's test mode, it will be resized to IMAGE_SIZE.

CHANNEL_X : 3 # input channel CHANNEL_Y : 3 # output channel TIMESTEPS : 100 # diffusion steps SCHEDULE : 'linear' # linear or cosine MODEL_CHANNELS : 32 # basic channels of Unet NUM_RESBLOCKS : 1 # number of residual blocks CHANNEL_MULT : [1,2,3,4] # channel multiplier of each layer NUM_HEADS : 1

MODE : 1 # 1 Train, 0 Test PRE_ORI : 'True' # if True, predict x0, else predict ϵ.

train

PATH_GT : '' # path of ground truth PATH_IMG : '' # path of input BATCH_SIZE : 4 # training batch size NUM_WORKERS : 2 # number of workers ITERATION_MAX : 1000000 # max training iteration LR : 0.0001 # learning rate LOSS : 'L2' # L1 or L2 EMA_EVERY : 100 # update EMA every EMA_EVERY iterations START_EMA : 2000 # start EMA after START_EMA iterations SAVE_MODEL_EVERY : 10000 # save model every SAVE_MODEL_EVERY iterations EMA: 'True' # if True, use EMA CONTINUE_TRAINING : 'True' # if True, continue training CONTINUE_TRAINING_STEPS : 10000 # continue training from CONTINUE_TRAINING_STEPS PRETRAINED_PATH_INITIAL_PREDICTOR : '/home/mepluser1/rahul_hanot/try_new_1/DocDiff/checksave/init.pth' # path of pretrained initial predictor PRETRAINED_PATH_DENOISER : '/home/mepluser1/rahul_hanot/try_new_1/DocDiff/checksave/denoiser.pth' # path of pretrained denoiser WEIGHT_SAVE_PATH : './checksave' # path to save model TRAINING_PATH : './Training' # path of training data BETA_LOSS : 50 # hyperparameter to balance the pixel loss and the diffusion loss HIGH_LOW_FREQ : 'True' # if True, training with frequency separation

The default config is suitable for document scenario. No need change. The training will be very stable.

rahulhanotDTU commented 1 year ago

should I change the LR from 0.0001 to 1e-5 or something lower?

Royalvice commented 1 year ago

should I change the LR from 0.0001 to 1e-5 or something lower?

1e-4 is ok. Lower means longer training time and performance will not be better.

rahulhanotDTU commented 1 year ago

I resumed the training with the above config with my custom data for the 281218 steps but I didn't get the required result so how long do I need to train the model

Royalvice commented 1 year ago

I resumed the training with the above config with my custom data for the 281218 steps but I didn't get the required result so how long do I need to train the model

Take a look at the output image of the training process

rahulhanotDTU commented 1 year ago

I trained the model for 7.5L but the results I am getting are uploaded below and the models weight is model_denoiser_450000.pth and model_init_450000.pth can you please tell why I am getting this type of result and how can improve them and your demo inference code and I am using
1277_resul

Royalvice commented 1 year ago

I trained the model for 7.5L but the results I am getting are uploaded below and the models weight is model_denoiser_450000.pth and model_init_450000.pth can you please tell why I am getting this type of result and how can improve them and your demo inference code and I am using 1277_resul

From the image you provided, this is not a simple deblurring task. The degraded document image you provided is not only blurry, but the strokes of the characters are also very faint. Therefore, your task is more difficult. My suggestion is to enhance the contrast of the input image to increase the intensity of the faint strokes.

Royalvice commented 1 year ago

I trained the model for 7.5L but the results I am getting are uploaded below and the models weight is model_denoiser_450000.pth and model_init_450000.pth can you please tell why I am getting this type of result and how can improve them and your demo inference code and I am using 1277_resul

It's best to train from scratch. If the results are still poor, you may need to design some additional modules to extract features.

rahulhanotDTU commented 1 year ago

I will try to train it from scratch but can you please help in writing a separate module or list of augmentation for these type of images so that model will predict them good

Royalvice commented 1 year ago

I will try to train it from scratch but can you please help in writing a separate module or list of augmentation for these type of images so that model will predict them good

I believe that it may be helpful for you to analyze the characteristics of your sample and consider designing additional modules to improve the overall outcome. This process typically involves a significant amount of trial and error in order to achieve the desired results.

rahulhanotDTU commented 1 year ago

I want to denoise the noise that is created by WhatsApp compression but I am unable to create that type of noise can you please help in creating this type of noise for text image d1

Royalvice commented 1 year ago

I want to denoise the noise that is created by WhatsApp compression but I am unable to create that type of noise can you please help in creating this type of noise for text image d1

This is salt-and-pepper noise, I might need to research how to synthesize it, but it's already on my to-do list.

rahulhanotDTU commented 1 year ago

I don't think so this is some kind of compression that was used by the WhatsApp to send the image can you please help urgently

Royalvice commented 1 year ago

I don't think so this is some kind of compression that was used by the WhatsApp to send the image can you please help urgently

Sure, I will do my best to assist you as soon as possible.

rahulhanotDTU commented 1 year ago

Any update regarding it

22chenR commented 10 months ago

请问你上传的inference notebook中怎么没有去噪和去水印的过程?当我把TEST_PATH_IMG 的路径改成带有水印的图像时,得到的结果是这样的。我想知道应该怎么办? ![Uploading PMC1635421_00004.jpg.png…]()