OpenTalker / video-retalking

[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
https://opentalker.github.io/video-retalking/
Apache License 2.0
6.63k stars 976 forks source link

ModuleNotFoundError: No module named 'torchvision.transforms.functional_tensor' #224

Open vter00 opened 8 months ago

vter00 commented 8 months ago

ModuleNotFoundError: No module named 'torchvision.transforms.functional_tensor'

DFMlaozhu commented 8 months ago

I encountered the same problem as you, have you solved it?

andradeofc commented 8 months ago

I have the same problem

arielweinberger commented 8 months ago

Go to file /usr/local/lib/python3.10/dist-packages/basicsr/data/degradations.py and change line number 8 to:

from torchvision.transforms.functional import rgb_to_grayscale

Got the solution from https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/13985

netxor17 commented 7 months ago

Try to uninstall and then install the latest version of torchvision. (0.17.1) pip install torchvision

Matrix-X commented 7 months ago

which version of torch, torchvision, torchaudio is recommended and run success on Mac M1 ?

Matrix-X commented 7 months ago

Try to uninstall and then install the latest version of torchvision. (0.17.1) pip install torchvision

❯ pip list | grep torch torch 2.2.1 torchaudio 2.2.1 torchvision 0.17.1 ❯ python inference.py --face examples/face/1.mp4 --audio examples/audio/1.wav --outfile results/1_1.mp4 Traceback (most recent call last): File "inference.py", line 16, in from third_part.GPEN.gpen_face_enhancer import FaceEnhancement File "/Volumes/AISpace/Workspace/DigitalHuman/video-retalking/third_part/GPEN/gpen_face_enhancer.py", line 11, in from utils.inference_utils import Laplacian_Pyramid_Blending_with_mask File "/Volumes/AISpace/Workspace/DigitalHuman/video-retalking/utils/inference_utils.py", line 5, in from models import load_network, load_DNet File "/Volumes/AISpace/Workspace/DigitalHuman/video-retalking/models/init.py", line 2, in from models.DNet import DNet File "/Volumes/AISpace/Workspace/DigitalHuman/video-retalking/models/DNet.py", line 10, in from models.base_blocks import LayerNorm2d, ADAINHourglass, FineEncoder, FineDecoder File "/Volumes/AISpace/Workspace/DigitalHuman/video-retalking/models/base_blocks.py", line 9, in from basicsr.archs.arch_util import default_init_weights File "/opt/anaconda3/envs/video_retalking/lib/python3.8/site-packages/basicsr/init.py", line 4, in from .data import * File "/opt/anaconda3/envs/video_retalking/lib/python3.8/site-packages/basicsr/data/init.py", line 22, in _dataset_modules = [importlib.import_module(f'basicsr.data.{file_name}') for file_name in dataset_filenames] File "/opt/anaconda3/envs/video_retalking/lib/python3.8/site-packages/basicsr/data/init.py", line 22, in _dataset_modules = [importlib.import_module(f'basicsr.data.{file_name}') for file_name in dataset_filenames] File "/opt/anaconda3/envs/video_retalking/lib/python3.8/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "/opt/anaconda3/envs/video_retalking/lib/python3.8/site-packages/basicsr/data/realesrgan_dataset.py", line 11, in from basicsr.data.degradations import circular_lowpass_kernel, random_mixed_kernels File "/opt/anaconda3/envs/video_retalking/lib/python3.8/site-packages/basicsr/data/degradations.py", line 8, in from torchvision.transforms.functional_tensor import rgb_to_grayscale ModuleNotFoundError: No module named 'torchvision.transforms.functional_tensor'

Matrix-X commented 7 months ago

which version of torch, torchvision, torchaudio is recommended and run success on Mac M1 ?

pip install torch==1.9.0 torchvision==0.10.0 torchaudio==0.9.0

❯ python inference.py --face examples/face/1.mp4 --audio examples/audio/1.wav --outfile results/1_1.mp4 [Info] Using cpu for inference. [Step 0] Number of frames available for inference: 135 [Step 1] Using saved landmarks. [Step 2] 3DMM Extraction In Video:: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 135/135 [00:05<00:00, 26.32it/s] using expression center Load checkpoint from: checkpoints/DNet.pt Load checkpoint from: checkpoints/LNet.pth Load checkpoint from: checkpoints/ENet.pth [Step 3] Using saved stabilized video. [Step 4] Load audio; Length of mel chunks: 109 [Step 5] Reference Enhancement: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 109/109 [08:27<00:00, 4.65s/it] landmark Det:: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 109/109 [00:46<00:00, 2.32it/s] 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 109/109 [00:00<00:00, 41943.04it/s] 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 109/109 [00:00<00:00, 1026.23it/s] FaceDet:: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [01:11<00:00, 2.55s/it] [Step 6] Lip Synthesis:: 0%| | 0/7 [02:04<?, ?it/s] Traceback (most recent call last): File "inference.py", line 345, in main() File "inference.py", line 221, in main pred, low_res = model(mel_batch, img_batch, reference) File "/opt/anaconda3/envs/video_retalking/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, kwargs) File "/Volumes/AISpace/Workspace/DigitalHuman/video-retalking/models/ENet.py", line 113, in forward low_res_img = self.low_res(audio_sequences, LNet_input) File "/opt/anaconda3/envs/video_retalking/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/Volumes/AISpace/Workspace/DigitalHuman/video-retalking/models/LNet.py", line 132, in forward _outputs = self.decoder(vis_feat, audio_feat) File "/opt/anaconda3/envs/video_retalking/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, kwargs) File "/Volumes/AISpace/Workspace/DigitalHuman/video-retalking/models/LNet.py", line 73, in forward out = res_model(out, z) File "/opt/anaconda3/envs/video_retalking/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, kwargs) File "/Volumes/AISpace/Workspace/DigitalHuman/video-retalking/models/base_blocks.py", line 425, in forward x = model(x, z) File "/opt/anaconda3/envs/video_retalking/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/Volumes/AISpace/Workspace/DigitalHuman/video-retalking/models/base_blocks.py", line 404, in forward x_l, x_g = self.conv1((x_l, x_g), z) File "/opt/anaconda3/envs/video_retalking/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, kwargs) File "/Volumes/AISpace/Workspace/DigitalHuman/video-retalking/models/base_blocks.py", line 383, in forward x_l, x_g = self.ffc(x) File "/opt/anaconda3/envs/video_retalking/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, kwargs) File "/Volumes/AISpace/Workspace/DigitalHuman/video-retalking/models/ffc.py", line 231, in forward out_xg = self.convl2g(x_l) l2g_gate + self.convg2g(x_g) File "/opt/anaconda3/envs/video_retalking/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, kwargs) File "/Volumes/AISpace/Workspace/DigitalHuman/video-retalking/models/ffc.py", line 157, in forward output = self.fu(x) File "/opt/anaconda3/envs/video_retalking/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/Volumes/AISpace/Workspace/DigitalHuman/video-retalking/models/ffc.py", line 99, in forward ffted = torch.fft.rfftn(x, dim=fft_dim, norm=self.fft_norm) RuntimeError: fft: ATen not compiled with MKL support

tomy128 commented 7 months ago

This is an bug for package basicsr==1.4.2, see this for details: https://github.com/XPixelGroup/BasicSR/pull/650/files

vettorazi commented 7 months ago

wow, that was super sketchy... but changing the 2D thing in both files + changing how the degradation file imports tourchvision (torchvision.transforms.functional import rgb_to_grayscale) + changing the requirements.txt worked for me! basically, what I'm saying is: if you try the first and second solves and nothing worked.. keep trying! some hack will fix this thing haha