Open zhanqigithub opened 1 week ago
Check server log please
Check server log please
Thanks,here is the log
File "talking_face2\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "\talking_face2\lib\site-packages\transformers\models\wav2vec2\modeling_wav2vec2.py", line 857, in forward position_embeddings = self.pos_conv_embed(hidden_states) File "\talking_face2\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(input, **kwargs) File "talking_face2\lib\site-packages\transformers\models\wav2vec2\modeling_wav2vec2.py", line 385, in forward hidden_states = hidden_states.transpose(1, 2) AttributeError: 'tuple' object has no attribute 'transpose'
try to revise line 110 in wav2vec.py from
hidden_states = self.feature_projection(hidden_states)
to
hidden_states = self.feature_projection(hidden_states)[0]
Thanks
Thanks, now I can see lipmoves, but no audio data
Thanks, now I can see lipmoves, but no audio data
It is out of the project. It is about UE engine.
Thanks, now I can see lipmoves, but no audio data
It is out of the project. It is about UE engine.
Thanks
Sorry , reopen this issue, I had tried with 30s audio, but the predict data is prediction size np arrray is 92736, 92736/32=2899 frames, which is much more longer than the audio length, can't match together, almost 3 times longer than the audio. any clue for this?
check the channel of audio file?
compare the basic information between the demo audio and your audio with the tool called sox.
check the channel of audio file?
I tried with the sample audio, get same result, the frame nums is so much longer, audio and lip movements can't match
send me the your audio data.
Follow the readme, but got error like this raise RequestsJSONDecodeError(e.msg, e.doc, e.pos) requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)