BrasD99 / HeyGenClone

A simple and open-source analogue of the HeyGen system
862 stars 172 forks source link

HeyGenClone

The project is no longer supported.

Welcome to the HeyGenClone, an open-source analogue of the HeyGen system.

I am a developer from Moscow πŸ‡·πŸ‡Ί who devotes his free time to studying new technologies. The project is in an active development phase, but I hope it will help you achieve your goals!

Currently, translation support is enabled only from English πŸ‡¬πŸ‡§!

Installation πŸ₯Έ

Configurations (config.json) πŸ§™β€β™‚οΈ

Key Description
DET_TRESH Face detection treshtold [0.0:1.0]
DIST_TRESH Face embeddings distance treshtold [0.0:1.0]
HF_TOKEN Your HuggingFace token (see Installation)
USE_ENHANCER Do we need to improve faces using GFPGAN?
ADD_SUBTITLES Subtitles in the output video

Supported languages πŸ™‚

English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Turkish (tr), Russian (ru), Dutch (nl), Czech (cs), Arabic (ar), Chinese (zh-cn), Japanese (ja), Hungarian (hu) and Korean (ko)

Usage 🀩

I also added a script to overlay the voice on the video with lip sync, which allows you to create a video with a person pronouncing your speech. Π‘urrently it works for videos with one person.

How it works 😱

  1. Detecting scenes (PySceneDetect)
  2. Face detection (yolov8-face)
  3. Reidentification (deepface)
  4. Speech enhancement (MDXNet)
  5. Speakers transcriptions and diarization (whisperX)
  6. Text translation (googletrans)
  7. Voice cloning (TTS)
  8. Lip sync (lipsync)
  9. Face restoration (GFPGAN)
  10. [Need to fix] Search for talking faces, determining what this person is saying

Translation results πŸ₯Ί

Note that this example was created without GFPGAN usage! Destination language Source video Output video
πŸ‡·πŸ‡Ί (Russian) Watch the video Watch the video

Contributors 🫡🏻

To-Do List πŸ€·πŸΌβ€β™‚οΈ

Other 🀘🏻