YoungSeng / DiffuseStyleGesture

DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models (IJCAI 2023) | The DiffuseStyleGesture+ entry to the GENEA Challenge 2023 (ICMI 2023, Reproducibility Award)
MIT License
146 stars 20 forks source link

Regarding codebook #8

Closed SaiChandra3030 closed 1 year ago

SaiChandra3030 commented 1 year ago

Hey, this was a fantastic repo I found in my research from the last few weeks I am trying to understand some code things from your repo is it possible for you to solve my issue below written

  1. The codebook is missing will I get this thing after training the model, I have seen the code also but it was not written.
  2. can I use the same codebook that was present CODEBOOK
  3. After Getting BVH, is there anywhere to convert it into the human avatar image?

Waiting for the solution :)

Thanks Sai

YoungSeng commented 1 year ago

Dear Sai,

Sorry for the confusing codes, you should use sample.py rather than inference.py, I have deleted the main/mydiffusion_zeggs/inference.py. And, this work hasn't used the codebook.

Best wishes.

SaiChandra3030 commented 1 year ago

Hi YoungSeng, Thanks For your reply.

Taking your reply into consideration I started playing with the sample.py

  1. First it worked fine with file 015_Happy_4_x_1_0.wav named this format
  2. I tried with the normal name like `1.wav' the sample.py is throwing the below error
Traceback (most recent call last):
  File "/content/drive/MyDrive/DiffuseStyleGesture/main/mydiffusion_zeggs/sample.py", line 418, in <module>
    main(config, save_dir, config.model_path, audio_path=None, mfcc_path=None, audiowavlm_path=config.audiowavlm_path, max_len=config.max_len)
  File "/content/drive/MyDrive/DiffuseStyleGesture/main/mydiffusion_zeggs/sample.py", line 378, in main
    style = style2onehot[audiowavlm_path.split('/')[-1].split('_')[1]]
IndexError: list index out of range

do we have any particular format that needs to be given as the input file name, can you please help me with this.

Regarding Input Format

In what format do we need to send the input with the size and shape of the input file could please help with this also.

Thanks Sai

YoungSeng commented 1 year ago

Dear Sai,

The code is a hard demo, if you want to use your own audio, you can comment out

https://github.com/YoungSeng/DiffuseStyleGesture/blob/85f4096c1784fbc1ffe52cfa9cf3ca653fcef9c9/main/mydiffusion_zeggs/sample.py#L378

and uncomment any of the following lines

https://github.com/YoungSeng/DiffuseStyleGesture/blob/85f4096c1784fbc1ffe52cfa9cf3ca653fcef9c9/main/mydiffusion_zeggs/sample.py#L379-L380

to choose your own Style and Intensity as

https://github.com/YoungSeng/DiffuseStyleGesture/blob/85f4096c1784fbc1ffe52cfa9cf3ca653fcef9c9/main/mydiffusion_zeggs/sample.py#L20-L27

Hope this will help you!

SaiChandra3030 commented 1 year ago

Hi YoungSeng, Thanks For You Time and Reply

I am facing a shape error, can you please mention the shape and size of the file need to be given as the input

Traceback (most recent call last):
  File "/content/drive/MyDrive/DiffuseStyleGesture/main/mydiffusion_zeggs/sample.py", line 420, in <module>
    main(config, save_dir, config.model_path, audio_path=None, mfcc_path=None, audiowavlm_path=config.audiowavlm_path, max_len=config.max_len)
  File "/content/drive/MyDrive/DiffuseStyleGesture/main/mydiffusion_zeggs/sample.py", line 384, in the main
    inference(args, wavlm_model, mfcc, sample_fn, model, n_frames=max_len, smoothing=True, SG_filter=True, minibatch=True, skip_timesteps=0, style=style, seed=123456)      # style2onehot['Happy']
  File "/content/drive/MyDrive/DiffuseStyleGesture/main/mydiffusion_zeggs/sample.py", line 233, in inference
    audio_reshape = torch.from_numpy(audio).to(torch.float32).reshape(num_subdivision, int(stride_poses * 16000 / 20)).to(mydevice).transpose(0, 1)       # mfcc[:, :-2]
RuntimeError: shape '[4, 64000]' is invalid for input of size 237867

Looking forward :)

Model file :- './model000450000.pt'

Thanks Sai

YoungSeng commented 1 year ago

it seems to be the problems of the shape of audio, do you set a max_len that more than the length of real audio? You may try to set max_len equal to 0. If you still have this problem, please upload the audio file. I will check it.

SaiChandra3030 commented 1 year ago

Hi YoungSeng, Thanks for your Time

  1. I have run the code, BVH file got generated in "./sample_dir", is there any way to convert it into mkv
  2. I am looking to convert directly "bvh" to "some persons image mp4" rendered video can I know if it is possible or can I know the process for it. I will work on it.

Thanks Sai

YoungSeng commented 1 year ago

Hey Sai,

In practice, I highly recommend using Blender visualization bvh. Similar software are maya, motionbuilder, I have tried them and found Blender more friendly. You can easily perform importing audio, rendering video, or even writing a script like Trimodal.

You can also get a video of the skeleton in Python. Please ref to this issue.

There are some repositories for visualization and you can also try, such as PyMO, npybvh, and Python_BVH_viewer, although I don't really recommend them.

Good luck!

SaiChandra3030 commented 1 year ago

Hi YounSeng,

I have tried a lot but I am not getting how to convert this BVH File to 3D Video With Audio, I need little help. is there any repo or any models or code to like what I needed

Thanks Sai

YoungSeng commented 1 year ago

I recommend you the method I use:

* For render, setting some parameters:
* Then render:
* To add audio:
sysu19351118 commented 2 months ago

I also encountered this problem, my audio is about 2 seconds, I set max_lenth=0, but still get this error: Traceback (most recent call last): File "sample.py", line 442, in main(config, save_dir, config.model_path, audio_path=None, mfcc_path=None, audiowavlm_path=config.audiowavlm_path, max_len=config.max_len) File "sample.py", line 406, in main inference(args, wavlm_model, mfcc, sample_fn, model, n_frames=max_len, smoothing=True, SG_filter=True, minibatch=True, skip_timesteps=0, style=style, seed=123456) # style2onehot['Happy'] File "sample.py", line 237, in inference audio_reshape = torch.from_numpy(audio).to(torch.float32).reshape(num_subdivision, int(stride_poses * 16000 / 20)).to(mydevice).transpose(0, 1) # mfcc[:, :-2] RuntimeError: shape '[4, 64000]' is invalid for input of size 36480