Open anujsinha72094 opened 6 months ago
Hi Anuj,
By "inference with new audio" do you mean inferencing directly with audio_novel.wav which is kept separate during dataset preparation or inferencing with absolutely different audio? If it's the latter case and you don't mind, could you tell what code changes did you do for inferencing with new audio?
Thanks
@ankit-gahlawat-007 I am inferencing with total new audio. I just added audio_novel.wav, audio_train.wav and aud.npy generated from new audio, and placed it in the folder in which model is trained , Then I ran python render.py -s data/__ --model_path __ --configs arguments/64_dim_1_transformer.py --iteration 10000 --batch 1. But there is flickering in the neck region . Could you state the possible solution to this?
@anujsinha72094 Did you try with Obama video ? Did the flickering happen only when u used new audio ?
@anujsinha72094 @nikhilchh I tested for Obama and May with new audio.
In the train rendering, I get good lip sync for new audio with no artifacts in the neck.
But in the test rendering, the mouth does not move at all (and there are no artifacts too). I guess it's just an issue with the code and not with the model, as anyways it produced outputs well for training set.
https://github.com/KU-CVLAB/GaussianTalker/assets/139246379/2e6f73cc-cad2-401b-9ca8-1a0b89edcb35
https://github.com/KU-CVLAB/GaussianTalker/assets/139246379/d81e02b3-2aea-489d-9a0b-19a502cf1a87
@anujsinha72094
Can you please tell me how to get/create following files.
track_params.pt
transforms_train.json
transforms_val.json
When i try to train the model I get an error:
[Errno 2] No such file or directory: 'GaussianTalker/data/obama/transforms_train.json'
@nikhilchh
Did you run process.py? The functions there generate these files.
So I found the issue.
process.py fails at face_tracking steps due to torch.cuda.OutOfMemoryError
I have a 10 gb GPU.
How can I make it work ? Is there any provision to reduce its memory consumption ?
I am not sure but could you try reducing the batch_size.
In data_utils/face_tracking/face_tracker.py, the batch_size is set to 32. You can try 16 or even less.
Thanks for the suggestion. That worked for me.
I added all the audio files from a new video into obama folder.
Ran the render code and lead to the same problem as @anujsinha72094 .
PS: I cannot run train_rendering as it crashes due ram. I have 32 gb ram which seems to be not enough. Observations: 1- The duration of the rendered video is longer than the audio. It is somehow still assuming that 30 second video is to be generated (that's the length of original obama novel audio) 2- Mouth doesn't move at all. (Only tried test rendering)
https://github.com/KU-CVLAB/GaussianTalker/assets/26286970/6ad39625-e2a8-40e9-845b-f89845eaf104
Thank you very much for using our work. Since we generate faces from audio and use GT background and torso, there might be slight mismatches in the jaw and neck areas when we generate by custom audio. To resolve this, we believe that generating the torso using the 2D neural field method from the original ER-NeRF approach and then generating our face on top of this with the GT background will easily address the issue. Additionally, we have added a command to easily run novel audio(custom audio)! We would appreciate it if you could check it out. Thanks! :)
Recently, it has become possible to generate the torso part as well using Gaussian splatting, in the same space as the face. Therefore, flickering issues at the neck region have significantly decreased in OOD audio!
https://github.com/KU-CVLAB/GaussianTalker/assets/87278950/634284cd-679e-42be-bbe1-9ba6bc2935cf
Hi @joungbinlee Do you have plans to release the code for this?
This one is a very good repo. But I am facing some issues. Since gaussian rasterizer setting is taking torso+bgimage as input, The torso is from training set, while performing inference with new audio, while rendering there are artifacts in the neck region because torso +bg image is from training set, what can be the solution for this?