Closed infusion-zero-edit closed 8 months ago
Hi thanks for posting this! This actually should be due to the fact that we only support recording up to 20 seconds of audio for now. The conditioning size for 20 seconds will result in a max embedding sequence length of 1998, which unfortunately is hardcoded here: https://github.com/facebookresearch/audio2photoreal/blob/548aeeb2057465045ca1568d65ea059cea633d80/model/diffusion.py#L136 I believe the above error should result if you end up with an audio embedding that is longer than such a sequence. But please let me know if you're still having issues with recording lengths less than 20 seconds.
thanks for replying back, after i posted this i came to know this fixed number by searching over git, but even after uploading 6sec audio, i am getting following error:
This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run gradio deploy
from Terminal to deploy to Spaces (https://huggingface.co/spaces)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:25<00:00, 3.98it/s]
created 3 samples
100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:11<00:00, 9.01it/s]
created 3 samples
0%| | 0/120 [00:03<?, ?it/s]
Traceback (most recent call last):
File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/gradio/queueing.py", line 489, in call_prediction
output = await route_utils.call_process_api(
File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/gradio/route_utils.py", line 232, in call_process_api
output = await app.get_blocks().process_api(
File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/gradio/blocks.py", line 1561, in process_api
result = await self.call_function(
File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/gradio/blocks.py", line 1179, in call_function
prediction = await anyio.to_thread.run_sync(
File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2134, in run_sync_in_worker_thread
return await future
File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 851, in run
result = context.run(func, args)
File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/gradio/utils.py", line 678, in wrapper
response = f(args, kwargs)
File "/home/jupyter/audio2photoreal/demo/demo.py", line 232, in audio_to_avatar
gradio_model.body_renderer.render_full_video(
File "/home/jupyter/audio2photoreal/visualize/render_codes.py", line 153, in render_full_video
self._write_video_stream(
File "/home/jupyter/audio2photoreal/visualize/render_codes.py", line 94, in _write_video_stream
out = self._render_loop(motion, face)
File "/home/jupyter/audio2photoreal/visualize/render_codes.py", line 121, in _render_loop
preds = self.model(default_inputs_copy)
File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, kwargs)
File "/home/jupyter/audio2photoreal/visualize/ca_body/models/mesh_vae_drivable.py", line 301, in forward
enc_preds = self.encode(geom, lbs_motion, face_embs)
File "/home/jupyter/audio2photoreal/visualize/ca_body/models/mesh_vae_drivable.py", line 266, in encode
face_dec_preds = self.decoder_face(face_embs_hqlp)
File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, *kwargs)
File "/home/jupyter/audio2photoreal/visualize/ca_body/nn/face.py", line 80, in forward
texout = self.texmod(self.texmod2(encview).view(-1, 256, 4, 4))
File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/torch/nn/modules/container.py", line 217, in forward
input = module(input)
File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
result = forward_call(*args, **kwargs)
File "/home/jupyter/audio2photoreal/visualize/ca_body/nn/layers.py", line 335, in forward
output = thf.conv_transpose2d(
RuntimeError: GET was unable to find an engine to execute this computation
all the torch versions and cuda versions which is 11.7 is there as per requirements file
Glad to hear the other issue is solved! Regarding the GET issue, could you try checking to see if your pytorch and cuda versions are compatible please? specifically, is the torch version compiled with cuda 11.7 and is your system using cuda 11.7? for the torch version, it could be installed with:
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
The mismatch seems to be the core of the reason you may be getting this error. Here are some links that are hopefully helpful?
These issues has been solved i have solved it by reinstalling cuda-11.7. Thanks for the reply just another question if i want to replace the avataar or new person without having any training data for that, how can we do that?
Glad to hear the issue is resolved. Unfortunately, you can't replace the avatar with a new person without having training data. See my reply here https://github.com/facebookresearch/audio2photoreal/issues/33
Traceback (most recent call last): File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/gradio/queueing.py", line 489, in call_prediction output = await route_utils.call_process_api( File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/gradio/route_utils.py", line 232, in call_process_api output = await app.get_blocks().process_api( File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/gradio/blocks.py", line 1561, in process_api result = await self.call_function( File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/gradio/blocks.py", line 1179, in call_function prediction = await anyio.to_thread.run_sync( File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2134, in run_sync_in_worker_thread return await future File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 851, in run result = context.run(func, args) File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/gradio/utils.py", line 678, in wrapper response = f(args, kwargs) File "/home/jupyter/audio2photoreal/demo/demo.py", line 220, in audio_to_avatar face_results, pose_results, audio = generate_results(audio, num_repetitions, top_p) File "/home/jupyter/audio2photoreal/demo/demo.py", line 188, in generate_results gradio_model.generate_sequences( File "/home/jupyter/audio2photoreal/demo/demo.py", line 148, in generate_sequences sample = self._run_single_diffusion( File "/home/jupyter/audio2photoreal/demo/demo.py", line 100, in _run_single_diffusion sample = sample_fn( File "/home/jupyter/audio2photoreal/diffusion/gaussian_diffusion.py", line 845, in ddim_sample_loop for sample in self.ddim_sample_loop_progressive( File "/home/jupyter/audio2photoreal/diffusion/gaussian_diffusion.py", line 925, in ddim_sample_loop_progressive out = sample_fn( File "/home/jupyter/audio2photoreal/diffusion/gaussian_diffusion.py", line 683, in ddim_sample out_orig = self.p_mean_variance( File "/home/jupyter/audio2photoreal/diffusion/respace.py", line 105, in p_mean_variance return super().p_mean_variance(self._wrap_model(model), *args, kwargs) File "/home/jupyter/audio2photoreal/diffusion/gaussian_diffusion.py", line 287, in p_mean_variance model_output = model(x, self._scale_timesteps(t), model_kwargs) File "/home/jupyter/audio2photoreal/diffusion/respace.py", line 145, in call return self.model(x, new_ts, *kwargs) File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/jupyter/audio2photoreal/model/cfg_sampler.py", line 35, in forward out = self.model(x, timesteps, y) File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/jupyter/audio2photoreal/model/diffusion.py", line 388, in forward cond_tokens = torch.where( RuntimeError: The size of tensor a (7998) must match the size of tensor b (1998) at non-singleton dimension 1