Hanbo-Cheng / DAWN-pytorch

Offical implement of Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for talking head Video Generation
154 stars 9 forks source link

using 3DDFA to extract the initial state of the portrait #8

Closed nitinmukesh closed 2 days ago

nitinmukesh commented 3 days ago

@Hanbo-Cheng

Thank you for your support in helping solve the issues.

Plus, it seems that you missed the first step (using 3DDFA to extract the initial state of the portrait). Although I use a default value in the code when the initial state is missing, it will usually cause worse results. If you have any problems when extracting the initial states, please let me know.

You mentioned this in the other thread. I am sorry I am not a developer, just trying to run on my local. Not sure what is missing as mentioned in above comment.

I did see a lot of missing files and incorrect folder in code which I fixed. The code is running.

Here is the complete log and output file. Yeah the output doesn't look good.

https://github.com/user-attachments/assets/eb2f8cfd-5b72-4d39-b0ad-e2b763dc9b40

(DAWN) C:\ai\DAWN-pytorch>run_ood_test\run_DM_v0_df_test_128_both_pose_blink.bat

(DAWN) C:\ai\DAWN-pytorch>REM Set variables

(DAWN) C:\ai\DAWN-pytorch>set test_name=ood_test_1009

(DAWN) C:\ai\DAWN-pytorch>set time_tag=tmp1009

(DAWN) C:\ai\DAWN-pytorch>set audio_path=WRA_MarcoRubio_000.wav

(DAWN) C:\ai\DAWN-pytorch>set image_path=real_female_1.jpeg

(DAWN) C:\ai\DAWN-pytorch>set cache_path=cache\tmp1009

(DAWN) C:\ai\DAWN-pytorch>set audio_emb_path=cache\target_audio.npy

(DAWN) C:\ai\DAWN-pytorch>set video_output_path=cache\

(DAWN) C:\ai\DAWN-pytorch>REM Activate the 3DDFA Conda environment and run the first script

(DAWN) C:\ai\DAWN-pytorch>call conda activate 3DDFA
">>>>>>>>>>>>>>1"
C:\ai\DAWN-pytorch\extract_init_states
C:\Users\nitin\miniconda3\envs\3DDFA\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py:69: UserWarning: Specified provider 'CUDAExecutionProvider' is not in available provider names.Available providers: 'AzureExecutionProvider, CPUExecutionProvider'
  warnings.warn(
">>>>>>>>>>>>>>2"
C:\ai\DAWN-pytorch
Loading the Wav2Vec2 Processor...
Ignored unknown kwarg option normalize
Ignored unknown kwarg option normalize
Ignored unknown kwarg option normalize
Ignored unknown kwarg option normalize
Loading the HuBERT Model...
2024-11-10 21:46:17.150817: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE SSE2 SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

TFHubertModel has backpropagation operations that are NOT supported on CPU. If you wish to train/fine-tune this model, you need a GPU or a TPU
2024-11-10 21:46:17.565863: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x24cbf54e800 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2024-11-10 21:46:17.566105: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2024-11-10 21:46:17.578035: I .\tensorflow/compiler/jit/device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
All TF 2.0 model weights were used when initializing HubertModel.

All the weights of HubertModel were initialized from the TF 2.0 model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use HubertModel for predictions without further training.
fnum525,hubersize1050
">>>>>>>>>>>>>>3"
C:\ai\DAWN-pytorch\PBnet
C:\Users\nitin\miniconda3\envs\DAWN\lib\site-packages\torch\nn\modules\transformer.py:282: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
  warnings.warn(f"enable_nested_tensor is True, but self.use_nested_tensor is False because {why_not_sparsity_fast_path}")
Restore weights..
eval!
eval!
">>>>>>>>>>>>>>4"
C:\ai\DAWN-pytorch
-j-of-tr-ddim0020_1.00
RESTORE_FROM: .\pretrain_models\DAWN_128.pth
cond scale: 1.0
sampling step: 20
C:\Users\nitin\miniconda3\envs\DAWN\lib\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\TensorShape.cpp:3527.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
=> loading checkpoint '.\pretrain_models\DAWN_128.pth'
=> loaded checkpoint '.\pretrain_models\DAWN_128.pth'
torch.Size([1, 6])
torch.Size([1, 8])
sampling loop time step: 100%|████████████████████████████████████████████████████████| 20/20 [00:22<00:00,  1.11s/it]
DDIM time 22.137852430343628
C:\Users\nitin\miniconda3\envs\DAWN\lib\site-packages\torch\nn\functional.py:4296: UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details.
  warnings.warn(
generation time 24.479230165481567
ffmpeg version 6.1-full_build-www.gyan.dev Copyright (c) 2000-2023 the FFmpeg developers
  built with gcc 12.2.0 (Rev10, Built by MSYS2 project)
  configuration: --enable-gpl --enable-version3 --enable-static --pkg-config=pkgconf --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-libsnappy --enable-zlib --enable-librist --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-libbluray --enable-libcaca --enable-sdl2 --enable-libaribb24 --enable-libaribcaption --enable-libdav1d --enable-libdavs2 --enable-libuavs3d --enable-libzvbi --enable-librav1e --enable-libsvtav1 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxvid --enable-libaom --enable-libjxl --enable-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-frei0r --enable-libfreetype --enable-libfribidi --enable-libharfbuzz --enable-liblensfun --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-dxva2 --enable-d3d11va --enable-libvpl --enable-libshaderc --enable-vulkan --enable-libplacebo --enable-opencl --enable-libcdio --enable-libgme --enable-libmodplug --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libshine --enable-libtheora --enable-libtwolame --enable-libvo-amrwbenc --enable-libcodec2 --enable-libilbc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-ladspa --enable-libbs2b --enable-libflite --enable-libmysofa --enable-librubberband --enable-libsoxr --enable-chromaprint
  libavutil      58. 29.100 / 58. 29.100
  libavcodec     60. 31.102 / 60. 31.102
  libavformat    60. 16.100 / 60. 16.100
  libavdevice    60.  3.100 / 60.  3.100
  libavfilter     9. 12.100 /  9. 12.100
  libswscale      7.  5.100 /  7.  5.100
  libswresample   4. 12.100 /  4. 12.100
  libpostproc    57.  3.100 / 57.  3.100
Trailing option(s) found in the command: may be ignored.
[aist#0:0/pcm_s16le @ 00000161a23079c0] Guessed Channel Layout: mono
Input #0, wav, from 'C:\ai\DAWN-pytorch\tmpsd848vg0.wav':
  Duration: 00:00:08.00, bitrate: 256 kb/s
  Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 1 channels, s16, 256 kb/s
Input #1, mov,mp4,m4a,3gp,3g2,mj2, from 'C:\ai\DAWN-pytorch\tmp45qrh_53.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2mp41
    encoder         : Lavf58.76.100
  Duration: 00:00:08.00, start: 0.000000, bitrate: 131 kb/s
  Stream #1:0[0x1](und): Video: mpeg4 (Simple Profile) (mp4v / 0x7634706D), yuv420p, 128x128 [SAR 1:1 DAR 1:1], 129 kb/s, 25 fps, 25 tbr, 12800 tbn (default)
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
Stream mapping:
  Stream #1:0 -> #0:0 (copy)
  Stream #0:0 -> #0:1 (pcm_s16le (native) -> aac (native))
Press [q] to stop, [?] for help
Output #0, mp4, to 'cache\\ood_test_1009\real_female_1\video\cache\target_audio.mp4':
  Metadata:
    encoder         : Lavf60.16.100
  Stream #0:0(und): Video: mpeg4 (Simple Profile) (mp4v / 0x7634706D), yuv420p, 128x128 [SAR 1:1 DAR 1:1], q=2-31, 129 kb/s, 25 fps, 25 tbr, 12800 tbn (default)
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
  Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 16000 Hz, stereo, fltp, 128 kb/s
    Metadata:
      encoder         : Lavc60.31.102 aac
[out#0/mp4 @ 00000161a231c940] video:127kB audio:80kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 2.370388%
size=     212kB time=00:00:07.96 bitrate= 218.2kbits/s speed=26.2x
[aac @ 00000161a2339180] Qavg: 63553.680
Permission denied: Unable to delete C:\ai\DAWN-pytorch\tmpsd848vg0.wav.
Permission denied: Unable to delete C:\ai\DAWN-pytorch\tmp45qrh_53.mp4.
29.9090152 seconds
cache\\ood_test_1009\real_female_1\video
Hanbo-Cheng commented 3 days ago

Please modify the .bat file according to the following code. This is an error caused by relative paths. I will update the file containing the bug as soon as possible. Apologies.

conda activate 3DDFA
cd extract_init_states
python demo_pose_extract_2d_lmk_img.py \
    --input ../$image_path \
    --output ../$cache_path
nitinmukesh commented 3 days ago

I did change as suggested left earlier output, right after the above change. Am I missing something.

https://github.com/user-attachments/assets/5fc6e215-52c9-4f55-a92e-a3a1ce5f063a

CleberPeter commented 3 days ago

Hi @Hanbo-Cheng, first congratulations for your awesome work here, even more by sharing with open source community.

I have doubts about the extraction aswell. I was able to run the build.sh inside the extract_init_states and everything goes fine. But there are still missing the the files below:

bfm_noneck_v3.onnx, bfm_noneck_v3.pkl, mb1_120x120.yml, param_mean_std_62d_120x120.pkl

as workaround i copied these files from 3DDFA_V2 repository it's the recommended approach ?

My doubt cames from my poor performance showed below:

https://github.com/user-attachments/assets/b4c91867-bf02-4f64-8755-54b1ca93931b

Am I missing something else?

Thanks in advance.

Hanbo-Cheng commented 3 days ago

I did change as suggested left earlier output, right after the above change. Am I missing something.

Project.1.mp4

No, I think it's close to the real performance in 128 setting.

Hanbo-Cheng commented 3 days ago

Hi @Hanbo-Cheng, first congratulations for your awesome work here, even more by sharing with open source community.

I have doubts about the extraction aswell. I was able to run the build.sh inside the extract_init_states and everything goes fine. But there are still missing the the files below:

bfm_noneck_v3.onnx, bfm_noneck_v3.pkl, mb1_120x120.yml, param_mean_std_62d_120x120.pkl

as workaround i copied these files from 3DDFA_V2 repository it's the recommended approach ?

My doubt cames from my poor performance showed below:

final_video.mp4 Am I missing something else?

Thanks in advance.

Thank you for your attention! Yes, downloading these files is correct. I found that the '.pkl' and 'onnx' were blocked by gitignore so they failed to upload to the repo. The bad performance is due to the failure to extract the initial states as mentioned above. You can try this .