Vchitect / SEINE

[ICLR 2024] SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction
https://vchitect.github.io/SEINE-project/
Apache License 2.0
915 stars 64 forks source link

An error occurs when using a custom image #8

Open 823863429 opened 11 months ago

823863429 commented 11 months ago

Thank you for your outstanding work. When I use the image I uploaded and execute sample_i2v.yaml, I will be prompted: loading video from input/i2v/rocket1.png loading the input image Traceback (most recent call last): File "/content/SEINE/sample_scripts/with_mask_sample.py", line 243, in main(omega_conf) File "/content/SEINE/sample_scripts/with_mask_sample.py", line 226, in main video_input, researve_frames = get_input(args) # f,c,h,w File "/content/SEINE/sample_scripts/with_mask_sample.py", line 105, in get_input video_frames = transform_video(video_frames) File "/usr/local/lib/python3.10/dist-packages/torchvision/transforms/transforms.py", line 95, in call img = t(img) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torchvision/transforms/transforms.py", line 277, in forward return F.normalize(tensor, self.mean, self.std, self.inplace) File "/usr/local/lib/python3.10/dist-packages/torchvision/transforms/functional.py", line 363, in normalize return F_t.normalize(tensor, mean=mean, std=std, inplace=inplace) File "/usr/local/lib/python3.10/dist-packages/torchvision/transforms/_functionaltensor.py", line 928, in normalize return tensor.sub(mean).div_(std) RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 1

Is there any way to fix this error ??

ExponentialML commented 11 months ago

Hey. You can either re-save it as a jpg (your image has an alpha / transparency channel), or create a function in the main sample code by adding:

def get_img(img_path_or_tensor):
    img = torch.as_tensor(
        np.array(
            Image.open(img_path_or_tensor), 
            dtype=np.uint8, copy=True)
        ).unsqueeze(0)

    return img[:, :, :, :3]

Then replace all instances like from here with:

first_frame = get_img(first_frame_path)
823863429 commented 11 months ago

Hey. You can either re-save it as a jpg (your image has an alpha / transparency channel), or create a function in the main sample code by adding:

def get_img(img_path_or_tensor):
    img = torch.as_tensor(
        np.array(
            Image.open(img_path_or_tensor), 
            dtype=np.uint8, copy=True)
        ).unsqueeze(0)

    return img[:, :, :, :3]

Then replace all instances like from here with:

first_frame = get_img(first_frame_path)

it work, thanks