Closed zhouzhenneng closed 2 months ago
没见过这么差的同步,看起来根本不是一段的东西,你给的训练过程有些地方也很奇怪,比如bash scripts/train_xx.sh /root/share/talkingGaussian/train/data/ao_head/ /root/share/talkingGaussian/train/trial/ao_head/ 2 --audio_extractor hubert
,如果用的是仓库里给的script的话那在bash上指定--audio_extractor hubert
是无效的,实际上用的还是deepspeech,但是inference的时候用的又是hubert,按道理会报错。确定整个过程都没问题吗
谢谢回复,之前整个训练推理过程没有报错,我重新训练再检查一次 请问目前支持hubert吗,怎么通过hubert对中文音频进行训练推理
根据readme中: Similar to ER-NeRF, HuBERT is also available. Recommended for situations if the audio is not in English. Specify --audio_extractor hubert when training and testing. 怎么在training和inference中使用hubert
根据readme中: Similar to ER-NeRF, HuBERT is also available. Recommended for situations if the audio is not in English. Specify --audio_extractor hubert when training and testing. 怎么在training和inference中使用hubert
具体在train_xx.sh里,可以看一下
根据readme中: Similar to ER-NeRF, HuBERT is also available. Recommended for situations if the audio is not in English. Specify --audio_extractor hubert when training and testing. 怎么在training和inference中使用hubert
具体在train_xx.sh里,可以看一下
谢谢回复,我检查了train_xx.sh 发现之前已经修改过了: dataset=$1 workspace=$2 gpu_id=$3 audio_extractor='hubert' # deepspeech, esperanto, hubert export CUDA_VISIBLE_DEVICES=$gpu_id
所以尽管training的时候,bash传参有问题,但还是正常用了hubert,inference的命令应该是正确的: python synthesize_fuse.py -S /root/share/talkingGaussian/train/data/ao_head/ -M /root/share/talkingGaussian/train/trial/ao_head/ --use_train --audio /root/audio/cosyVoice_fish_faster_hu.npy --dilate --audio_extractor hubert 后续,我把iteration增加到10w步,并关注没有报错信息,您方便给一个邮箱,我把视频和音频素材发给您,你帮忙验证下口型问题
所以尽管training的时候,bash传参有问题,但还是正常用了hubert,inference的命令应该是正确的: python synthesize_fuse.py -S /root/share/talkingGaussian/train/data/ao_head/ -M /root/share/talkingGaussian/train/trial/ao_head/ --use_train --audio /root/audio/cosyVoice_fish_faster_hu.npy --dilate --audio_extractor hubert 后续,我把iteration增加到10w步,并关注没有报错信息,您方便给一个邮箱,我把视频和音频素材发给您,你帮忙验证下口型问题
您发一下我看看吧,邮箱在github主页有
后续:发现问题是由于唇部动作拟合到了无关的面部表情参数上,增大train_face.py line194的惩罚项参数至1e-3后正常
你好,我之前使用过ER-NERF训练推理过视频,中文的口型准确度还可以,目前尝试使用了talkingGaussian,出现了中文的嘴型对不上的问题,这是我的训练脚本,训练的iteration是默认的,视频素材是5分钟绿幕视频,25fps:
预处理
python data_utils/process.py /root/share/talkingGaussian/train/data/ao_head/ao_head.mp4
牙齿
export PYTHONPATH=./data_utils/easyportrait python ./data_utils/easyportrait/create_teeth_mask.py /root/share/talkingGaussian/train/data/ao_head/
hubert 处理音频
python data_utils/hubert.py --wav /root/audio/cosyVoice_fish_faster.wav
train
bash scripts/train_xx.sh /root/share/talkingGaussian/train/data/ao_head/ /root/share/talkingGaussian/train/trial/ao_head/ 2 --audio_extractor hubert
推理
python synthesize_fuse.py -S /root/share/talkingGaussian/train/data/ao_head/ -M /root/share/talkingGaussian/train/trial/ao_head/ --use_train --audio /root/audio/cosyVoice_fish_faster_hu.npy --dilate --audio_extractor hubert
训练出来的视频片段如下:
https://github.com/user-attachments/assets/92370606-9650-4ef3-af29-8518f6b44367
目前还存在问题: