choijeongsoo / av2av

[CVPR 2024] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
MIT License
23 stars 2 forks source link

Question about some file for inference #7

Open Peter-SungwooCho opened 3 weeks ago

Peter-SungwooCho commented 3 weeks ago

Hello authors,

Thanks you for provide wonderful work av2av.

I am running the code for inference, and if my understanding is correct, I think three additional files are needed.

    temp_audio_path = os.path.splitext(args.in_vid_path)[0]+".temp.wav"
    lip_video_path = os.path.splitext(args.in_vid_path)[0]+".lip.mp4"
    bbox_path = os.path.splitext(args.in_vid_path)[0]+".bbox.pkl"

May I know how to obtain these files for another video data which is not provided?

Best Regards, Sungwoo Cho

choijeongsoo commented 2 weeks ago

Hello,

Thank you for your interest in our work!

temp_audio_path is used to save the audio extracted from the video file. As for lip_video_path and bbox_path, you can find more information in issue #6.