YudongGuo / AD-NeRF

This repository contains a PyTorch implementation of "AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis".
MIT License
1.02k stars 172 forks source link

Video renders without audio. Is this expected? #98

Open rohana96 opened 2 years ago

rohana96 commented 2 years ago

Hi Yudong! Thanks for the code release!

I noticed the video rendering code only renders the video frames without audio. Here's the piece of code that handles the rendering. Please let me know if I am missing something. If not, how can I render audio-video merged output?

 vid_out = cv2.VideoWriter(os.path.join(testsavedir, 'result.avi'),
                                      cv2.VideoWriter_fourcc('M', 'J', 'P', 'G'), 25, (W, H))
  for j in range(poses.shape[0]):
      rgbs, disps, last_weights, rgb_fgs = \
          render_path(adjust_poses[j:j+1], auds_val[j:j+1],
                      bc_img, hwfcxy, args.chunk, render_kwargs_test)
      rgbs_torso, disps_torso, last_weights_torso, rgb_fgs_torso = \
          render_path(torso_pose.unsqueeze(
              0), signal[j:j+1], bc_img.to(device_torso), hwfcxy, args.chunk, render_kwargs_test_torso)
      rgbs_com = rgbs*last_weights_torso[..., None] + rgb_fgs_torso
      rgb8 = to8b(rgbs_com[0])
      vid_out.write(rgb8[:, :, ::-1])
      filename = os.path.join(
          testsavedir, str(aud_ids[j]) + '.jpg')
      imageio.imwrite(filename, rgb8)
      print('finished render', j)
  print('finished render in', time.time()-t_start)
  vid_out.release()
exceedzhang commented 1 year ago

After the video is rendered, it will be merged with the previous sound. But I found that the length of the generated video and sound is different, which will be a few milliseconds shorter. Who has encountered similar problems? @YudongGuo