Open nitinmukesh opened 4 months ago
here is the pip list
(echomimic) C:\Users\nitin>pip list
Package Version
------------------------- ------------
absl-py 2.1.0
accelerate 0.32.1
aiofiles 23.2.1
aiohttp 3.9.5
aiosignal 1.3.1
albumentations 1.1.0
altair 5.3.0
annotated-types 0.7.0
antlr4-python3-runtime 4.9.3
anyio 4.4.0
asgiref 3.8.1
asttokens 2.4.1
async-timeout 4.0.3
attrs 23.2.0
av 11.0.0
backcall 0.2.0
backports.zoneinfo 0.2.1
blinker 1.8.2
blosc2 2.0.0
boto3 1.34.143
botocore 1.34.143
cachetools 5.3.3
certifi 2024.7.4
cffi 1.16.0
charset-normalizer 3.3.2
clean-fid 0.1.35
click 8.1.7
colorama 0.4.6
colorlog 6.8.2
configobj 5.0.8
contourpy 1.1.1
cycler 0.12.1
Cython 3.0.10
datasets 2.20.0
decorator 4.4.2
decord 0.6.0
deepdish 0.3.7
Deprecated 1.2.14
diffusers 0.24.0
dill 0.3.8
Django 4.2.14
dnspython 2.6.1
docker-pycreds 0.4.0
easydict 1.13
einops 0.4.1
email_validator 2.2.0
ete3 3.1.3
exceptiongroup 1.2.1
executing 2.0.1
facenet-pytorch 2.5.0
fastapi 0.111.0
fastapi-cli 0.0.4
ffmpeg-python 0.2.0
ffmpy 0.3.2
filelock 3.15.4
flatbuffers 24.3.25
fonttools 4.53.1
frozenlist 1.4.1
fsspec 2024.5.0
ftfy 6.0.3
future 1.0.0
gitdb 4.0.11
GitPython 3.1.43
google-auth 2.32.0
google-auth-oauthlib 1.0.0
gradio 4.37.2
gradio_client 1.0.2
grpcio 1.64.1
h11 0.14.0
h5py 3.11.0
httpcore 1.0.5
httptools 0.6.1
httpx 0.27.0
huggingface-hub 0.23.4
idna 3.7
imageio 2.14.1
imageio-ffmpeg 0.4.7
importlib_metadata 8.0.0
importlib_resources 6.4.0
intel-openmp 2021.4.0
invisible-watermark 0.2.0
ipdb 0.13.13
ipython 8.12.3
jax 0.4.13
jedi 0.19.1
Jinja2 3.1.4
jmespath 1.0.1
joblib 1.4.2
json-lines 0.5.0
jsonschema 4.23.0
jsonschema-specifications 2023.12.1
kiwisolver 1.4.5
kornia 0.6.0
lazy_loader 0.4
lpips 0.1.4
Markdown 3.6
markdown-it-py 3.0.0
MarkupSafe 2.1.5
matplotlib 3.7.5
matplotlib-inline 0.1.7
mdurl 0.1.2
mediapipe 0.10.11
mkl 2021.4.0
ml-dtypes 0.2.0
moviepy 1.0.3
mpmath 1.3.0
msgpack 1.0.8
multidict 6.0.5
multiprocess 0.70.16
networkx 3.1
nltk 3.8.1
numexpr 2.8.6
numpy 1.24.4
oauthlib 3.2.2
omegaconf 2.3.0
opencv-contrib-python 4.10.0.84
opencv-python 4.2.0.34
opencv-python-headless 4.10.0.84
opt-einsum 3.3.0
orderedset 2.0.3
orjson 3.10.6
packaging 24.1
pandas 2.0.3
parso 0.8.4
pickleshare 0.7.5
Pillow 9.0.1
pip 24.0
pkgutil_resolve_name 1.3.10
platformdirs 4.2.2
proglog 0.1.10
progressbar 2.5
prompt_toolkit 3.0.47
protobuf 3.20.3
psutil 6.0.0
pudb 2019.2
pure-eval 0.2.2
py-cpuinfo 9.0.0
pyarrow 16.1.0
pyarrow-hotfix 0.6
pyasn1 0.6.0
pyasn1_modules 0.4.0
pycparser 2.22
pydantic 2.8.2
pydantic_core 2.20.1
pydeck 0.9.1
pyDeprecate 0.3.1
pydub 0.25.1
Pygments 2.18.0
pymongo 4.8.0
pyparsing 3.1.2
python-dateutil 2.9.0.post0
python-dotenv 1.0.1
python-magic 0.4.27
python-multipart 0.0.9
pytorch-fid 0.3.0
pytorch-lightning 1.5.9
pytz 2024.1
PyWavelets 1.4.1
PyYAML 6.0.1
qudida 0.0.4
referencing 0.35.1
regex 2024.5.15
requests 2.32.3
requests-oauthlib 2.0.0
rich 13.7.1
rouge_score 0.1.2
rpds-py 0.19.0
rsa 4.9
ruff 0.5.1
s3transfer 0.10.2
safetensors 0.4.3
scikit-image 0.20.0
scikit-learn 1.3.2
scipy 1.9.1
semantic-version 2.10.0
sentry-sdk 2.9.0
setproctitle 1.3.3
setuptools 59.5.0
shellingham 1.5.4
simplejson 3.19.2
six 1.16.0
smmap 5.0.1
sniffio 1.3.1
sounddevice 0.4.7
sqlparse 0.5.0
stack-data 0.6.3
starlette 0.37.2
streamlit 1.36.0
sympy 1.13.0
tables 3.8.0
tbb 2021.13.0
tenacity 8.5.0
tensorboard 2.14.0
tensorboard-data-server 0.7.2
tensorboardX 2.4.1
test_tube 0.7.5
threadpoolctl 3.5.0
tifffile 2023.7.10
timm 1.0.7
tokenizers 0.19.1
toml 0.10.2
tomli 2.0.1
tomlkit 0.12.0
toolz 0.12.1
torch 2.2.2+cu121
torch-fidelity 0.3.0
torchaudio 2.2.2
torchmetrics 0.6.0
torchtyping 0.1.4
torchvision 0.17.2+cu121
tornado 6.4.1
tqdm 4.66.4
traitlets 5.14.3
transformers 4.42.4
typeguard 4.3.0
typer 0.12.3
typing_extensions 4.12.2
tzdata 2024.1
ujson 5.10.0
urllib3 2.2.2
urwid 2.6.15
uvicorn 0.30.1
wandb 0.17.4
watchdog 4.0.1
watchfiles 0.22.0
wcwidth 0.2.13
websockets 11.0.3
Werkzeug 3.0.3
wheel 0.43.0
wrapt 1.16.0
xxhash 3.4.1
yacs 0.1.8
yarl 1.9.4
zipp 3.19.2
Could you please add more logs in the code. It seems that the inference does not work at all. And the provided information is not sufficient to find the bugs.
Could you please add more logs in the code. It seems that the inference does not work at all. And the provided information is not sufficient to find the bugs.
Okay here you go. Used Claude.
Log
(echomimic) C:\sd\EchoMimic> python -u infer_audio2vid_pose.py
C:\Users\nitin\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
C:\Users\nitin\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
2024-07-15 14:44:19,964 - INFO - 1. Starting main function
2024-07-15 14:44:19,964 - INFO - 2. Arguments parsed
2024-07-15 14:44:19,967 - INFO - 3. Config loaded
2024-07-15 14:44:19,967 - INFO - 4. Weight dtype set to torch.float16
2024-07-15 14:44:21,626 - INFO - 5. Device set to cuda
2024-07-15 14:44:21,637 - INFO - 6. Inference config loaded
2024-07-15 14:44:21,637 - INFO - 7. Starting model initialization
2024-07-15 14:44:21,637 - INFO - 8. Initializing VAE
2024-07-15 14:44:22,045 - INFO - 9. VAE initialized
2024-07-15 14:44:22,045 - INFO - 10. Initializing Reference UNet
2024-07-15 14:44:31,577 - INFO - 11. Reference UNet initialized
2024-07-15 14:44:31,577 - INFO - 12. Initializing Denoising UNet
2024-07-15 14:44:31,577 - INFO - loaded temporal unet's pretrained weights from pretrained_weights\sd-image-variations-diffusers\unet ...
2024-07-15 14:44:38,743 - INFO - Load motion module params from pretrained_weights\motion_module_pose.pth
2024-07-15 14:44:41,520 - INFO - Loaded 453.20928M-parameter motion module
2024-07-15 14:44:47,617 - INFO - 13. Denoising UNet initialized
2024-07-15 14:44:47,617 - INFO - 14. Initializing Face Locator
2024-07-15 14:44:47,695 - INFO - 15. Face Locator initialized
2024-07-15 14:44:47,695 - INFO - 16. Initializing Visualizer
2024-07-15 14:44:47,695 - INFO - 17. Visualizer initialized
2024-07-15 14:44:47,695 - INFO - 18. Loading Audio Processor
2024-07-15 14:44:48,140 - INFO - 19. Audio Processor loaded
2024-07-15 14:44:48,140 - INFO - 20. Initializing Face Detector
2024-07-15 14:44:48,160 - INFO - 21. Face Detector initialized
2024-07-15 14:44:48,160 - INFO - 23. Model initialization completed
2024-07-15 14:44:48,171 - INFO - 24. Scheduler initialized
2024-07-15 14:44:48,171 - INFO - 25. Creating pipeline
2024-07-15 14:44:48,181 - INFO - 26. Pipeline created
2024-07-15 14:44:48,181 - INFO - 28. Save directory created: output\20240715\1444--seed_420-512x512
2024-07-15 14:44:48,181 - INFO - 29. Processing reference image: ./assets/test_pose_demo/d.jpg
2024-07-15 14:44:48,181 - INFO - 30. Audio path: ./assets/test_pose_demo_audios/movie_0_clip_0.wav, Pose directory: ./assets/test_pose_demo_pose
2024-07-15 14:44:48,181 - INFO - 31. Generator seed set: 420
2024-07-15 14:44:48,181 - INFO - 32. Reference name: d, Audio name: movie_0_clip_0, FPS: 24
2024-07-15 14:44:48,191 - INFO - 33. Reference image loaded
2024-07-15 14:44:48,191 - INFO - 34. Starting face_locator process
2024-07-15 14:44:48,381 - INFO - 35. Face mask tensor created
2024-07-15 14:44:48,381 - INFO - 36. Starting pipeline processing
video in 24 FPS, audio idx in 50FPS
2024-07-15 14:44:48,952 - WARNING - C:\sd\EchoMimic\src\pipelines\pipeline_echo_mimic_pose.py:446: FutureWarning: Accessing config attribute `in_channels` directly via 'EchoUNet3DConditionModel' object attribute is deprecated. Please access 'in_channels' over 'EchoUNet3DConditionModel's config object instead, e.g. 'unet.config.in_channels'.
num_channels_latents = self.denoising_unet.in_channels
latents shape:torch.Size([1, 4, 160, 64, 64]), video_length:160
2024-07-15 14:44:49,333 - WARNING - C:\Users\nitin\miniconda3\envs\echomimic\lib\site-packages\diffusers\models\attention_processor.py:1231: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
hidden_states = F.scaled_dot_product_attention(
2024-07-15 14:44:49,375 - ERROR - 41. Error during video processing: [WinError 6] The handle is invalid
2024-07-15 14:44:49,380 - INFO - 42. Main function completed
Code
import argparse
import os
import random
from datetime import datetime
from pathlib import Path
from typing import List
import av
import cv2
import numpy as np
import torch
import torchvision
from diffusers import AutoencoderKL, DDIMScheduler
from diffusers.pipelines.stable_diffusion import StableDiffusionPipeline
from einops import repeat
from omegaconf import OmegaConf
from PIL import Image
from torchvision import transforms
from transformers import CLIPVisionModelWithProjection
from src.models.unet_2d_condition import UNet2DConditionModel
from src.models.unet_3d_echo import EchoUNet3DConditionModel
from src.models.whisper.audio2feature import load_audio_model
from src.pipelines.pipeline_echo_mimic_pose import AudioPose2VideoPipeline
from src.utils.util import get_fps, read_frames, save_videos_grid, crop_and_pad
import sys
from src.models.face_locator import FaceLocator
from moviepy.editor import VideoFileClip, AudioFileClip
from facenet_pytorch import MTCNN
from src.utils.draw_utils import FaceMeshVisualizer
import pickle
import logging
# Set up logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', stream=sys.stdout)
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument("--config", type=str, default="./configs/prompts/animation_pose.yaml")
parser.add_argument("-W", type=int, default=512)
parser.add_argument("-H", type=int, default=512)
parser.add_argument("-L", type=int, default=160)
parser.add_argument("--seed", type=int, default=420)
parser.add_argument("--facemusk_dilation_ratio", type=float, default=0.1)
parser.add_argument("--facecrop_dilation_ratio", type=float, default=0.5)
parser.add_argument("--context_frames", type=int, default=12)
parser.add_argument("--context_overlap", type=int, default=3)
parser.add_argument("--cfg", type=float, default=2.5)
parser.add_argument("--steps", type=int, default=30)
parser.add_argument("--sample_rate", type=int, default=16000)
parser.add_argument("--fps", type=int, default=24)
parser.add_argument("--device", type=str, default="cuda")
args = parser.parse_args()
return args
def select_face(det_bboxes, probs):
## max face from faces that the prob is above 0.8
## box: xyxy
filtered_bboxes = []
for bbox_i in range(len(det_bboxes)):
if probs[bbox_i] > 0.8:
filtered_bboxes.append(det_bboxes[bbox_i])
if len(filtered_bboxes) == 0:
return None
sorted_bboxes = sorted(filtered_bboxes, key=lambda x:(x[3]-x[1]) * (x[2] - x[0]), reverse=True)
return sorted_bboxes[0]
def main():
logging.info("1. Starting main function")
args = parse_args()
logging.info("2. Arguments parsed")
config = OmegaConf.load(args.config)
logging.info("3. Config loaded")
if config.weight_dtype == "fp16":
weight_dtype = torch.float16
else:
weight_dtype = torch.float32
logging.info(f"4. Weight dtype set to {weight_dtype}")
device = args.device
if device.__contains__("cuda") and not torch.cuda.is_available():
device = "cpu"
logging.info(f"5. Device set to {device}")
inference_config_path = config.inference_config
infer_config = OmegaConf.load(inference_config_path)
logging.info("6. Inference config loaded")
logging.info("7. Starting model initialization")
try:
logging.info("8. Initializing VAE")
vae = AutoencoderKL.from_pretrained(
config.pretrained_vae_path,
).to("cuda", dtype=weight_dtype)
logging.info("9. VAE initialized")
logging.info("10. Initializing Reference UNet")
reference_unet = UNet2DConditionModel.from_pretrained(
config.pretrained_base_model_path,
subfolder="unet",
).to(dtype=weight_dtype, device=device)
reference_unet.load_state_dict(
torch.load(config.reference_unet_path, map_location="cpu"),
)
logging.info("11. Reference UNet initialized")
logging.info("12. Initializing Denoising UNet")
if os.path.exists(config.motion_module_path):
### stage1 + stage2
denoising_unet = EchoUNet3DConditionModel.from_pretrained_2d(
config.pretrained_base_model_path,
config.motion_module_path,
subfolder="unet",
unet_additional_kwargs=infer_config.unet_additional_kwargs,
).to(dtype=weight_dtype, device=device)
else:
### only stage1
denoising_unet = EchoUNet3DConditionModel.from_pretrained_2d(
config.pretrained_base_model_path,
"",
subfolder="unet",
unet_additional_kwargs={
"use_motion_module": False,
"unet_use_temporal_attention": False,
"cross_attention_dim": infer_config.unet_additional_kwargs.cross_attention_dim
}
).to(dtype=weight_dtype, device=device)
denoising_unet.load_state_dict(
torch.load(config.denoising_unet_path, map_location="cpu"),
strict=False
)
logging.info("13. Denoising UNet initialized")
logging.info("14. Initializing Face Locator")
face_locator = FaceLocator(320, conditioning_channels=3, block_out_channels=(16, 32, 96, 256)).to(
dtype=weight_dtype, device="cuda"
)
face_locator.load_state_dict(torch.load(config.face_locator_path))
logging.info("15. Face Locator initialized")
logging.info("16. Initializing Visualizer")
visualizer = FaceMeshVisualizer(draw_iris=False, draw_mouse=False)
logging.info("17. Visualizer initialized")
logging.info("18. Loading Audio Processor")
audio_processor = load_audio_model(model_path=config.audio_model_path, device=device)
logging.info("19. Audio Processor loaded")
logging.info("20. Initializing Face Detector")
face_detector = MTCNN(image_size=320, margin=0, min_face_size=20, thresholds=[0.6, 0.7, 0.7], factor=0.709, post_process=True, device=device)
logging.info("21. Face Detector initialized")
except Exception as e:
logging.error(f"22. Error during model initialization: {str(e)}")
return
logging.info("23. Model initialization completed")
width, height = args.W, args.H
sched_kwargs = OmegaConf.to_container(infer_config.noise_scheduler_kwargs)
scheduler = DDIMScheduler(**sched_kwargs)
logging.info("24. Scheduler initialized")
try:
logging.info("25. Creating pipeline")
pipe = AudioPose2VideoPipeline(
vae=vae,
reference_unet=reference_unet,
denoising_unet=denoising_unet,
audio_guider=audio_processor,
face_locator=face_locator,
scheduler=scheduler,
)
pipe = pipe.to("cuda", dtype=weight_dtype)
logging.info("26. Pipeline created")
except Exception as e:
logging.error(f"27. Error creating pipeline: {str(e)}")
return
date_str = datetime.now().strftime("%Y%m%d")
time_str = datetime.now().strftime("%H%M")
save_dir_name = f"{time_str}--seed_{args.seed}-{args.W}x{args.H}"
save_dir = Path(f"output/{date_str}/{save_dir_name}")
save_dir.mkdir(exist_ok=True, parents=True)
logging.info(f"28. Save directory created: {save_dir}")
for ref_image_path in config["test_cases"].keys():
logging.info(f"29. Processing reference image: {ref_image_path}")
for file_path in config["test_cases"][ref_image_path]:
if ".wav" in file_path:
audio_path = file_path
else:
pose_dir = file_path
logging.info(f"30. Audio path: {audio_path}, Pose directory: {pose_dir}")
if args.seed is not None and args.seed > -1:
generator = torch.manual_seed(args.seed)
else:
generator = torch.manual_seed(random.randint(100, 1000000))
logging.info(f"31. Generator seed set: {generator.initial_seed()}")
ref_name = Path(ref_image_path).stem
audio_name = Path(audio_path).stem
final_fps = args.fps
logging.info(f"32. Reference name: {ref_name}, Audio name: {audio_name}, FPS: {final_fps}")
ref_image_pil = Image.open(ref_image_path).convert("RGB")
logging.info("33. Reference image loaded")
logging.info("34. Starting face_locator process")
pose_list = []
for index in range(len(os.listdir(pose_dir))):
tgt_musk_path = os.path.join(pose_dir, f"{index}.pkl")
with open(tgt_musk_path, "rb") as f:
tgt_kpts = pickle.load(f)
tgt_musk = visualizer.draw_landmarks((args.W, args.H), tgt_kpts)
tgt_musk_pil = Image.fromarray(np.array(tgt_musk).astype(np.uint8)).convert('RGB')
pose_list.append(torch.Tensor(np.array(tgt_musk_pil)).to(dtype=weight_dtype, device="cuda").permute(2,0,1) / 255.0)
face_mask_tensor = torch.stack(pose_list, dim=1).unsqueeze(0)
logging.info("35. Face mask tensor created")
try:
logging.info("36. Starting pipeline processing")
video = pipe(
ref_image_pil,
audio_path,
face_mask_tensor,
width,
height,
args.L,
args.steps,
args.cfg,
generator=generator,
audio_sample_rate=args.sample_rate,
context_frames=12,
fps=final_fps,
context_overlap=3
).videos
logging.info("37. Pipeline processing completed")
video = torch.cat([video[:, :, :args.L, :, :], face_mask_tensor[:, :, :args.L, :, :].detach().cpu()], dim=-1)
save_videos_grid(
video,
f"{save_dir}/{ref_name}_{audio_name}_{args.H}x{args.W}_{int(args.cfg)}_{time_str}.mp4",
n_rows=2,
fps=final_fps,
)
logging.info(f"38. Video saved: {save_dir}/{ref_name}_{audio_name}_{args.H}x{args.W}_{int(args.cfg)}_{time_str}.mp4")
logging.info("39. Adding audio to video")
video_clip = VideoFileClip(f"{save_dir}/{ref_name}_{audio_name}_{args.H}x{args.W}_{int(args.cfg)}_{time_str}.mp4")
audio_clip = AudioFileClip(audio_path)
video_clip = video_clip.set_audio(audio_clip)
video_clip.write_videofile(f"{save_dir}/{ref_name}_{audio_name}_{args.H}x{args.W}_{int(args.cfg)}_{time_str}_withaudio.mp4", codec="libx264", audio_codec="aac")
logging.info(f"40. Video with audio saved: {save_dir}/{ref_name}_{audio_name}_{args.H}x{args.W}_{int(args.cfg)}_{time_str}_withaudio.mp4")
except Exception as e:
logging.error(f"41. Error during video processing: {str(e)}")
logging.info("42. Main function completed")
if __name__ == "__main__":
main()
I got similar errors, a new folder named d (containing numerous pkl files) created in project root folder. :
(echomimic) D:\ai\EchoMimic>python -u demo_motion_sync.py WARNING: All log messages before absl::InitializeLog() is called are written to STDERR W0000 00:00:1721116243.688493 19664 face_landmarker_graph.cc:174] Sets FaceBlendshapesGraph acceleration to xnnpack by default. INFO: Created TensorFlow Lite XNNPACK delegate for CPU. W0000 00:00:1721116243.698317 2292 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors. W0000 00:00:1721116243.705942 2292 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors. W0000 00:00:1721116243.713692 6704 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors. C:\Users\quyan.conda\envs\echomimic\lib\site-packages\google\protobuf\symbol_database.py:55: UserWarning: SymbolDatabase.GetPrototype() is deprecated. Please use message_factory.GetMessageClass() instead. SymbolDatabase.GetPrototype() will be removed soon. warnings.warn('SymbolDatabase.GetPrototype() is deprecated. Please ' 288
I followed the instructions and it is not working
Edit driver_video and ref_image to your path in demo_motion_sync.py, then run left it as it is linking to sample
python -u demo_motion_sync.py Output https://youtu.be/1JsPRYPiQso
python -u infer_audio2vid_pose.py [with draw_mouse=True] No output produced, no error in console