BadToBest / EchoMimic

Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
https://badtobest.github.io/echomimic.html
Apache License 2.0
2.25k stars 262 forks source link

Motion Sync not working #59

Closed A-2-H closed 1 month ago

A-2-H commented 1 month ago

Windows 10, conda. I tried audio2video script before and it worked. Now that I updated your program to try "motion sync" nothing happen. I tried multiple times the motion sync script:

(echomimic) S:\AIprograms\EchoMimic>  python -u demo_motion_sync.py
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1721230965.082872   31692 face_landmarker_graph.cc:174] Sets FaceBlendshapesGraph acceleration to xnnpack by default.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
288

in my root folder EchoMimic created folder named "d" it's the same name as source image. I can see some ".pkl" files but that's it. Overall there is 288 pkl files in "d" folder. So it matches the numer in conda "288". But still it didn't rendered video.

This is my pip list:

Package                   Version
------------------------- ------------
absl-py                   2.1.0
accelerate                0.32.1
aiofiles                  23.2.1
altair                    5.3.0
annotated-types           0.7.0
antlr4-python3-runtime    4.9.3
anyio                     4.4.0
asttokens                 2.4.1
attrs                     23.2.0
av                        11.0.0
backcall                  0.2.0
Brotli                    1.0.9
certifi                   2024.7.4
cffi                      1.16.0
charset-normalizer        3.3.2
click                     8.1.7
colorama                  0.4.6
contourpy                 1.1.1
cycler                    0.12.1
decorator                 4.4.2
diffusers                 0.24.0
dnspython                 2.6.1
einops                    0.4.1
email_validator           2.2.0
exceptiongroup            1.2.2
executing                 2.0.1
facenet-pytorch           2.5.0
fastapi                   0.111.1
fastapi-cli               0.0.4
ffmpeg-python             0.2.0
ffmpy                     0.3.2
filelock                  3.13.1
flatbuffers               24.3.25
fonttools                 4.53.1
fsspec                    2024.6.1
future                    1.0.0
gmpy2                     2.1.2
gradio                    4.38.1
gradio_client             1.1.0
h11                       0.14.0
httpcore                  1.0.5
httptools                 0.6.1
httpx                     0.27.0
huggingface-hub           0.23.4
idna                      3.7
imageio                   2.34.2
imageio-ffmpeg            0.5.1
importlib_metadata        8.0.0
importlib_resources       6.4.0
intel-openmp              2021.4.0
ipython                   8.12.3
jax                       0.4.13
jedi                      0.19.1
Jinja2                    3.1.4
jsonschema                4.23.0
jsonschema-specifications 2023.12.1
kiwisolver                1.4.5
lazy_loader               0.4
lightning-utilities       0.11.3.post0
markdown-it-py            3.0.0
MarkupSafe                2.1.3
matplotlib                3.7.5
matplotlib-inline         0.1.7
mdurl                     0.1.2
mediapipe                 0.10.11
mkl                       2021.4.0
mkl-fft                   1.3.8
mkl-random                1.2.4
mkl-service               2.4.0
ml-dtypes                 0.2.0
moviepy                   1.0.3
mpmath                    1.3.0
networkx                  3.1
numpy                     1.24.3
omegaconf                 2.3.0
opencv-contrib-python     4.10.0.84
opencv-python             4.10.0.84
opt-einsum                3.3.0
orjson                    3.10.6
packaging                 24.1
pandas                    2.0.3
parso                     0.8.4
pickleshare               0.7.5
pillow                    10.3.0
pip                       24.0
pkgutil_resolve_name      1.3.10
proglog                   0.1.10
prompt_toolkit            3.0.47
protobuf                  3.20.3
psutil                    6.0.0
pure-eval                 0.2.2
pycparser                 2.22
pydantic                  2.8.2
pydantic_core             2.20.1
pydub                     0.25.1
Pygments                  2.18.0
pyparsing                 3.1.2
PySocks                   1.7.1
python-dateutil           2.9.0.post0
python-dotenv             1.0.1
python-multipart          0.0.9
pytz                      2024.1
PyWavelets                1.4.1
PyYAML                    6.0.1
referencing               0.35.1
regex                     2024.5.15
requests                  2.32.2
rich                      13.7.1
rpds-py                   0.19.0
ruff                      0.5.2
safetensors               0.4.3
scikit-image              0.21.0
scipy                     1.10.1
semantic-version          2.10.0
setuptools                69.5.1
shellingham               1.5.4
six                       1.16.0
sniffio                   1.3.1
sounddevice               0.4.7
stack-data                0.6.3
starlette                 0.37.2
sympy                     1.12
tbb                       2021.13.0
tifffile                  2023.7.10
tokenizers                0.19.1
tomlkit                   0.12.0
toolz                     0.12.1
torch                     2.0.1
torchaudio                2.0.2
torchmetrics              1.4.0.post0
torchtyping               0.1.4
torchvision               0.15.2
tqdm                      4.66.4
traitlets                 5.14.3
transformers              4.42.3
typeguard                 4.3.0
typer                     0.12.3
typing_extensions         4.11.0
tzdata                    2024.1
urllib3                   2.2.2
uvicorn                   0.30.1
watchfiles                0.22.0
wcwidth                   0.2.13
websockets                11.0.3
wheel                     0.43.0
win-inet-pton             1.1.0
zipp                      3.19.2
nitinmukesh commented 1 month ago

It seems the problem is in EchoMimic\src\pipelines\pipeline_echo_mimic_pose_acc.py

@torch.no_grad()
    def __call__(

..................

print("23: with self.progress_bar")
        with self.progress_bar(total=num_inference_steps) as progress_bar:
            print(f"23.1: total:{total}, num_inference_steps:{num_inference_steps}")

Output

(echomimic) C:\tut\EchoMimic>python -u infer_audio2vid_pose_acc.py
C:\Users\nitin\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
C:\Users\nitin\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(

23: with self.progress_bar

(echomimic) C:\tut\EchoMimic>

It is not going inside with statement.

If developers can help how to debug I am ready to help.

nitinmukesh commented 1 month ago

Some more logs. Not going inside with statement


(echomimic) C:\tut\EchoMimic>python -u infer_audio2vid_pose_acc.py
C:\Users\nitin\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
C:\Users\nitin\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
1: Initializing pipeline
2: Pipeline initialized
7: Getting execution device
video in 24 FPS, audio idx in 50FPS
15: Preparing latents
17: latents shape:torch.Size([1, 4, 160, 64, 64]), video_length:160
20: face_locator_tensor
10: Preparing extra step kwargs
11: Extra step kwargs prepared
21: extra_step_kwargs
22: denoising loop
23: with self.progress_bar
ref_image_latents shape: torch.Size([1, 4, 64, 64])
face_mask_tensor shape: torch.Size([1, 3, 240, 512, 512])
face_locator_tensor shape: torch.Size([1, 320, 240, 64, 64])
self.progress_bar: <bound method DiffusionPipeline.progress_bar of AudioPose2VideoPipeline {
  "_class_name": "AudioPose2VideoPipeline",
  "_diffusers_version": "0.24.0",
  "audio_guider": [
    "src.models.whisper.audio2feature",
    "Audio2Feature"
  ],
  "denoising_unet": [
    "src.models.unet_3d_echo",
    "EchoUNet3DConditionModel"
  ],
  "face_locator": [
    "src.models.face_locator",
    "FaceLocator"
  ],
  "image_proj_model": [
    null,
    null
  ],
  "reference_unet": [
    "src.models.unet_2d_condition",
    "UNet2DConditionModel"
  ],
  "scheduler": [
    "diffusers",
    "DDIMScheduler"
  ],
  "text_encoder": [
    null,
    null
  ],
  "tokenizer": [
    null,
    null
  ],
  "vae": [
    "diffusers",
    "AutoencoderKL"
  ]
}
>

(echomimic) C:\tut\EchoMimic>

Code


        print("23: with self.progress_bar")
        print("ref_image_latents shape:", ref_image_latents.shape)
        print("face_mask_tensor shape:", face_mask_tensor.shape)
        print("face_locator_tensor shape:", face_locator_tensor.shape)
        print("self.progress_bar:", self.progress_bar)
        with self.progress_bar(total=num_inference_steps) as progress_bar:
            print("Inside with statement")
DoItEric commented 1 month ago

maybe Motion Sync just for help people extract motion pkl file from video? you'll get new dir to save pkl files with run this script. and you can run audio2vid_pose to sync it well , right?

nitinmukesh commented 1 month ago

maybe Motion Sync just for help people extract motion pkl file from video? you'll get new dir to save pkl files with run this script. and you can run audio2vid_pose to sync it well , right?

We are just trying to inference sample provided in this repo which is not working.

Creating pickle using our own video is working fine.

JoeFannie commented 1 month ago

motion sync only provides pkl files for each frame. It is a pre-process for the driven video (if you have your own driven video and ref image, you should run it before calling infer to generate video). Now, try the new script released. Motion sync is done online in the infer process. No need to run it individually.