convert validation_data.prompt_image mode from RGBA to RGB

I got an error when executing train.py and train_svd.py using a png image.

Traceback (most recent call last):
  File "/root/animate-anything/train.py", line 1167, in <module>
    main_eval(**args_dict)
  File "/root/animate-anything/train.py", line 1154, in main_eval
    batch_eval(unet, text_encoder, vae, vae_processor, lora_manager, pretrained_model_path,
  File "/root/animate-anything/train.py", line 1114, in batch_eval
    precision = eval(pipeline, vae_processor,
  File "/root/animate-anything/train.py", line 1033, in eval
    input_image_latents = tensor_to_vae_latent(input_image, vae)
  File "/root/animate-anything/train.py", line 365, in tensor_to_vae_latent
    latents = vae.encode(t).latent_dist.sample()
  File "/root/animate-anything/venv/lib/python3.10/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
    return method(self, *args, **kwargs)
  File "/root/animate-anything/venv/lib/python3.10/site-packages/diffusers/models/autoencoder_kl.py", line 259, in encode
    h = self.encoder(x)
  File "/root/animate-anything/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/animate-anything/venv/lib/python3.10/site-packages/diffusers/models/vae.py", line 141, in forward
    sample = self.conv_in(sample)
  File "/root/animate-anything/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/animate-anything/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/root/animate-anything/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [128, 3, 3, 3], expected input[1, 4, 512, 512] to have 3 channels, but got 4 channels instead

This is caused by the png image may contain 4 channels. I think the mode of the input image can convert from RGBA to RGB for solve this error, just like svd https://github.com/Stability-AI/generative-models/blob/main/scripts/sampling/simple_video_sample.py#L97.

alibaba / animate-anything

convert validation_data.prompt_image mode from RGBA to RGB #16