Royalvice / DocDiff

ACM Multimedia 2023: DocDiff: Document Enhancement via Residual Diffusion Models. Also contains 1597 red seals in Chinese scenes, along with their corresponding binary masks.
https://www.aibupt.com/
MIT License
196 stars 21 forks source link

RuntimeError: Sizes of tensors #2

Closed Ru-Van closed 1 year ago

Ru-Van commented 1 year ago

I tried to test your model and got an inference in Colab, but I got an error: File "/content/DocDiff/model/DocDiff.py", line 315, in forward x = torch.cat((x, s), dim=1) RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 440 but got size 439 for tensor number 1 in the list.

 315        print(x.shape, s.shape)
 316        x = torch.cat((x, s), dim=1)
------------------------------------------------------------------------------
torch.Size([1, 128, 220, 160]) torch.Size([1, 128, 220, 160])
torch.Size([1, 128, 220, 160]) torch.Size([1, 96, 220, 160])
torch.Size([1, 128, 440, 320]) torch.Size([1, 96, 439, 319])

My conf.yml

# model
IMAGE_SIZE : [128, 128]   # load image size, if it's train mode, it will be randomly cropped to IMAGE_SIZE. If it's test mode, it will be resized to IMAGE_SIZE.
CHANNEL_X : 3             # input channel
CHANNEL_Y : 3             # output channel
TIMESTEPS : 100           # diffusion steps
SCHEDULE : 'linear'       # linear or cosine
MODEL_CHANNELS : 32       # basic channels of Unet
NUM_RESBLOCKS : 1         # number of residual blocks
CHANNEL_MULT : [1,2,3,4]  # channel multiplier of each layer
NUM_HEADS : 1

MODE : 0                  # 1 Train, 0 Test
PRE_ORI : 'True'          # if True, predict $x_0$, else predict $\epsilon$.

# test
NATIVE_RESOLUTION : 'False'               # if True, test with native resolution
DPM_SOLVER : 'False'      # if True, test with DPM_solver
DPM_STEP : 20             # DPM_solver step
BATCH_SIZE_VAL : 1        # test batch size
TEST_PATH_GT : '/content/drive/MyDrive/wight/data/'         # path of ground truth
TEST_PATH_IMG : '/content/drive/MyDrive/wight/data/'        # path of input
TEST_INITIAL_PREDICTOR_WEIGHT_PATH : '/content/drive/MyDrive/wight/init_predictor_document_deblurring.pth'   # path of initial predictor
TEST_DENOISER_WEIGHT_PATH : '/content/drive/MyDrive/wight/denoiser_document_deblurring.pth'            # path of denoiser
TEST_IMG_SAVE_PATH : './results'  

Input image has size [1654, 2339, 3]

Could you help me solve the problem?

Royalvice commented 1 year ago

Hello, as mentioned in Guide in README.EN.md, please make sure that the width and height of the input image are both multiples of 8. So [1654, 2339, 3] needs to be padded to [1656, 2344, 3].