fudan-generative-vision / hallo

Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
https://fudan-generative-vision.github.io/hallo/
MIT License
6.59k stars 814 forks source link

Web-UI Request #19

Open nustato opened 3 weeks ago

nustato commented 3 weeks ago

Thank you very much for the great work!

It would be super nice if you could add a simple web UI with the following functionality to enhance the user experience, especially as a tool used for remote education in developing countries:

  1. Audio input field with the option of using the microphone for a maximum recording of 5 minutes. The audio is recorded as a wav file and automatically used as the input audio file.
  2. Option for integrating a locally hosted LLM with the ability to generate text which is then converted to a wav audio file via a locally hosted TTS model, for a maximum of 5 minutes worth of text, and automatically used as an input audio wav file.
  3. For the image input field, add an option for an animated "default-idle-state" that displays after submitting the image file giving the user the feeling of immersion with a "live" character.

Thank you!

daswer123 commented 3 weeks ago

Hi, made a webui with all possible settings, and also pulled out those that are initially available only from the code

https://github.com/daswer123/hallo-webui

image

anstonjie commented 3 weeks ago

To create a public link, set share=True in launch(). Traceback (most recent call last): File "D:\ai\hallo-webui\scripts\inference.py", line 40, in from hallo.animate.face_animate import FaceAnimatePipeline ModuleNotFoundError: No module named 'hallo'

运行你的项目出现上面的这个错误

anstonjie commented 3 weeks ago

那个hallo就是他内部的包,他为啥找不到呢

daswer123 commented 3 weeks ago

For some reason you have not installed the inner library.

To fix it.

1. Open the console in the root folder and activate the virtual environment.
2. Enter the command pip install -e .
nustato commented 3 weeks ago

Hi, made a webui with all possible settings, and also pulled out those that are initially available only from the code

https://github.com/daswer123/hallo-webui

Looks great as a start! I would love to see the other features that were requested, like integrating with an LLM to generate text then integrating that with a TTS model to convert the text into a wav audio file to use as input for generating the animated talking face video.

It would also be nice to be able to play the generated video inside the WebUI.

Thanks for your work!

JPW0080 commented 3 weeks ago

The compressed portable is larger than it needs/has to be? hallo-portable/webui/pretrained_models contains an .git folder wih an unused 10.5GB's of files.

iovart commented 3 weeks ago

Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding Python runtime state: core initialized ModuleNotFoundError: No module named 'encodings'

daswer123 commented 3 weeks ago

@JPW0080 Thanks for letting me know, I updated the current version and removed the extra files @anstonjie I think I've solved that problem too

@iovart Please make sure you have python 3.10 on your system.

JPW0080 commented 3 weeks ago

On Windows 11, on my side, The No module named 'hallo' also came up. Copying the hallo folder to scripts remedied it.

iovart commented 3 weeks ago

@daswer123 Python 3.12.4 is installed

iovart commented 3 weeks ago

@daswer123 installed version ten. Now this error ModuleNotFoundError: No module named 'triton' Traceback (most recent call last): File "...\hallo-portable\webui\scripts\inference.py", line 40, in from hallo.animate.face_animate import FaceAnimatePipeline ModuleNotFoundError: No module named 'hallo'

daswer123 commented 3 weeks ago

@iovart About triton Don't mind it, it's fine on Windows

about hallo, copy hallo folder to venv\Lib\site-packages

iovart commented 3 weeks ago

@daswer123 Thanks

iovart commented 3 weeks ago

@daswer123 RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

Kingbadger3d commented 3 weeks ago

Im getting the same issue, I have Python 3.10 installed, Ive also tried using mini conda but just get the error:

Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding Python runtime state: core initialized ModuleNotFoundError: No module named 'encodings'

Current thread 0x000010fc (most recent call first):

Any chance you can make a version that user miniconda, then env variables etc shouldnt be an issue.
nitinmukesh commented 3 weeks ago

@daswer123

Getting the following error with Windows portable version (Windows 11)

Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
C:\tut\hallo-portable-2\webui\venv\lib\site-packages\transformers\utils\hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
A matching Triton is not available, some optimizations will not be enabled
Traceback (most recent call last):
  File "C:\tut\hallo-portable-2\webui\venv\lib\site-packages\xformers\__init__.py", line 55, in _is_triton_available
    from xformers.triton.softmax import softmax as triton_softmax  # noqa
  File "C:\tut\hallo-portable-2\webui\venv\lib\site-packages\xformers\triton\softmax.py", line 11, in <module>
    import triton
ModuleNotFoundError: No module named 'triton'
WARNING:py.warnings:C:\tut\hallo-portable-2\webui\venv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py:69: UserWarning: Specified provider 'CUDAExecutionProvider' is not in available provider names.Available providers: 'AzureExecutionProvider, CPUExecutionProvider'
  warnings.warn(

Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: ./pretrained_models/face_analysis\models\1k3d68.onnx landmark_3d_68 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: ./pretrained_models/face_analysis\models\2d106det.onnx landmark_2d_106 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: ./pretrained_models/face_analysis\models\genderage.onnx genderage ['None', 3, 96, 96] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: ./pretrained_models/face_analysis\models\glintr100.onnx recognition ['None', 3, 112, 112] 127.5 127.5
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: ./pretrained_models/face_analysis\models\scrfd_10g_bnkps.onnx detection [1, 3, '?', '?'] 127.5 128.0
set det-size: (640, 640)
WARNING:py.warnings:C:\tut\hallo-portable-2\webui\venv\lib\site-packages\insightface\utils\transform.py:68: FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`.
  P = np.linalg.lstsq(X_homo, Y)[0].T # Affine matrix. 3 x 4

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1718732157.838881    3768 face_landmarker_graph.cc:174] Sets FaceBlendshapesGraph acceleration to xnnpack by default.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
W0000 00:00:1718732157.857614   19012 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1718732157.867077   15400 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
WARNING:py.warnings:C:\tut\hallo-portable-2\webui\venv\lib\site-packages\google\protobuf\symbol_database.py:55: UserWarning: SymbolDatabase.GetPrototype() is deprecated. Please use message_factory.GetMessageClass() instead. SymbolDatabase.GetPrototype() will be removed soon.
  warnings.warn('SymbolDatabase.GetPrototype() is deprecated. Please '

Processed and saved: ./.cache\FACE_sep_background.png
Processed and saved: ./.cache\FACE_sep_face.png
Some weights of Wav2VecModel were not initialized from the model checkpoint at ./pretrained_models/wav2vec/wav2vec2-base-960h and are newly initialized: ['wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original1', 'wav2vec2.masked_spec_embed']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
INFO:audio_separator.separator.separator:Separator version 0.17.2 instantiating with output_dir: ./.cache\audio_preprocess, output_format: WAV
INFO:audio_separator.separator.separator:Operating System: Windows 10.0.22631
INFO:audio_separator.separator.separator:System: Windows Node: nits Release: 10 Machine: AMD64 Proc: Intel64 Family 6 Model 186 Stepping 2, GenuineIntel
INFO:audio_separator.separator.separator:Python Version: 3.10.9
INFO:audio_separator.separator.separator:PyTorch Version: 2.2.2+cu121
INFO:audio_separator.separator.separator:FFmpeg installed: ffmpeg version 2023-07-06-git-f00222e81f-essentials_build-www.gyan.dev Copyright (c) 2000-2023 the FFmpeg developers
INFO:audio_separator.separator.separator:ONNX Runtime CPU package installed with version: 1.18.0
INFO:audio_separator.separator.separator:CUDA is available in Torch, setting Torch device to CUDA
WARNING:audio_separator.separator.separator:CUDAExecutionProvider not available in ONNXruntime, so acceleration will NOT be enabled
INFO:audio_separator.separator.separator:Loading model Kim_Vocal_2.onnx...
INFO:audio_separator.separator.separator:Load model duration: 00:00:00
INFO:audio_separator.separator.separator:Starting separation process for audio_file_path: C:\tut\hallo-portable-2\tmp\gradio\7a15c4f12b9f23e6560e8b1180bfbc1222820c46\how are you.wav
100%|████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:04<00:00,  2.28s/it]
100%|████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  9.81it/s]
INFO:audio_separator.separator.separator:Saving Vocals stem to how are you_(Vocals)_Kim_Vocal_2.wav...
INFO:audio_separator.separator.separator:Clearing input audio file paths, sources and stems...
INFO:audio_separator.separator.separator:Separation duration: 00:00:14
The config attributes {'center_input_sample': False, 'out_channels': 4} were passed to UNet2DConditionModel, but are not expected and will be ignored. Please verify your config.json configuration file.
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel:
 ['conv_norm_out.bias, conv_norm_out.weight, conv_out.bias, conv_out.weight']
INFO:hallo.models.unet_3d:loaded temporal unet's pretrained weights from pretrained_models\stable-diffusion-v1-5\unet ...
The config attributes {'center_input_sample': False} were passed to UNet3DConditionModel, but are not expected and will be ignored. Please verify your config.json configuration file.
Load motion module params from pretrained_models\motion_module\mm_sd_v15_v2.ckpt
INFO:hallo.models.unet_3d:Loaded 453.20928M-parameter motion module
loaded weight from  ./pretrained_models/hallo\net.pth
Traceback (most recent call last):
  File "C:\tut\hallo-portable-2\webui\scripts\inference.py", line 424, in <module>
    inference_process(
  File "C:\tut\hallo-portable-2\webui\scripts\inference.py", line 383, in inference_process
    tensor_result = torch.cat(tensor_result, dim=2)
RuntimeError: torch.cat(): expected a non-empty list of Tensors
nitinmukesh commented 3 weeks ago

I also tried manual install using install.bat (https://github.com/daswer123/hallo-webui)

On generate video getting this error

A matching Triton is not available, some optimizations will not be enabled
Traceback (most recent call last):
  File "C:\tut\hallo-webui\venv\lib\site-packages\xformers\__init__.py", line 55, in _is_triton_available
    from xformers.triton.softmax import softmax as triton_softmax  # noqa
  File "C:\tut\hallo-webui\venv\lib\site-packages\xformers\triton\softmax.py", line 11, in <module>
    import triton
ModuleNotFoundError: No module named 'triton'
WARNING:py.warnings:C:\tut\hallo-webui\venv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py:69: UserWarning: Specified provider 'CUDAExecutionProvider' is not in available provider names.Available providers: 'AzureExecutionProvider, CPUExecutionProvider'
  warnings.warn(

Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: ./pretrained_models/face_analysis\models\1k3d68.onnx landmark_3d_68 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: ./pretrained_models/face_analysis\models\2d106det.onnx landmark_2d_106 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: ./pretrained_models/face_analysis\models\genderage.onnx genderage ['None', 3, 96, 96] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: ./pretrained_models/face_analysis\models\glintr100.onnx recognition ['None', 3, 112, 112] 127.5 127.5
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: ./pretrained_models/face_analysis\models\scrfd_10g_bnkps.onnx detection [1, 3, '?', '?'] 127.5 128.0
set det-size: (640, 640)
WARNING:py.warnings:C:\tut\hallo-webui\venv\lib\site-packages\insightface\utils\transform.py:68: FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`.
  P = np.linalg.lstsq(X_homo, Y)[0].T # Affine matrix. 3 x 4

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1718734650.085367    5976 face_landmarker_graph.cc:174] Sets FaceBlendshapesGraph acceleration to xnnpack by default.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
W0000 00:00:1718734650.104611   13836 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1718734650.112907   13836 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
WARNING:py.warnings:C:\tut\hallo-webui\venv\lib\site-packages\google\protobuf\symbol_database.py:55: UserWarning: SymbolDatabase.GetPrototype() is deprecated. Please use message_factory.GetMessageClass() instead. SymbolDatabase.GetPrototype() will be removed soon.
  warnings.warn('SymbolDatabase.GetPrototype() is deprecated. Please '

Processed and saved: ./.cache\FACE_sep_background.png
Processed and saved: ./.cache\FACE_sep_face.png
Some weights of Wav2VecModel were not initialized from the model checkpoint at ./pretrained_models/wav2vec/wav2vec2-base-960h and are newly initialized: ['wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original1', 'wav2vec2.masked_spec_embed']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
INFO:audio_separator.separator.separator:Separator version 0.17.2 instantiating with output_dir: ./.cache\audio_preprocess, output_format: WAV
INFO:audio_separator.separator.separator:Operating System: Windows 10.0.22631
INFO:audio_separator.separator.separator:System: Windows Node: nits Release: 10 Machine: AMD64 Proc: Intel64 Family 6 Model 186 Stepping 2, GenuineIntel
INFO:audio_separator.separator.separator:Python Version: 3.10.6
INFO:audio_separator.separator.separator:PyTorch Version: 2.2.2+cu121
INFO:audio_separator.separator.separator:FFmpeg installed: ffmpeg version 6.1-full_build-www.gyan.dev Copyright (c) 2000-2023 the FFmpeg developers
INFO:audio_separator.separator.separator:ONNX Runtime CPU package installed with version: 1.18.0
INFO:audio_separator.separator.separator:CUDA is available in Torch, setting Torch device to CUDA
WARNING:audio_separator.separator.separator:CUDAExecutionProvider not available in ONNXruntime, so acceleration will NOT be enabled
INFO:audio_separator.separator.separator:Loading model Kim_Vocal_2.onnx...
INFO:audio_separator.separator.separator:Load model duration: 00:00:00
INFO:audio_separator.separator.separator:Starting separation process for audio_file_path: C:\Users\nitin\AppData\Local\Temp\gradio\ce8e7e8828611150284cd3f595a43cd2ba717a88\audio.wav
100%|████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:04<00:00,  2.19s/it]
100%|████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  9.23it/s]
INFO:audio_separator.separator.separator:Saving Vocals stem to audio_(Vocals)_Kim_Vocal_2.wav...
INFO:audio_separator.separator.separator:Clearing input audio file paths, sources and stems...
INFO:audio_separator.separator.separator:Separation duration: 00:00:16
The config attributes {'center_input_sample': False, 'out_channels': 4} were passed to UNet2DConditionModel, but are not expected and will be ignored. Please verify your config.json configuration file.
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel:
 ['conv_norm_out.bias, conv_norm_out.weight, conv_out.bias, conv_out.weight']
INFO:hallo.models.unet_3d:loaded temporal unet's pretrained weights from pretrained_models\stable-diffusion-v1-5\unet ...
The config attributes {'center_input_sample': False} were passed to UNet3DConditionModel, but are not expected and will be ignored. Please verify your config.json configuration file.
Load motion module params from pretrained_models\motion_module\mm_sd_v15_v2.ckpt
INFO:hallo.models.unet_3d:Loaded 453.20928M-parameter motion module
Traceback (most recent call last):
  File "C:\tut\hallo-webui\scripts\inference.py", line 424, in <module>
    inference_process(
  File "C:\tut\hallo-webui\scripts\inference.py", line 281, in inference_process
    torch.load(
  File "C:\tut\hallo-webui\venv\lib\site-packages\torch\serialization.py", line 1005, in load
    with _open_zipfile_reader(opened_file) as opened_zipfile:
  File "C:\tut\hallo-webui\venv\lib\site-packages\torch\serialization.py", line 457, in __init__
    super().__init__(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
AricGamma commented 2 weeks ago

You can try this one https://huggingface.co/spaces/fudan-generative-ai/hallo Finally, thanks to all contributors from community.💕💕💕