RuntimeError: The size of tensor a (7998) must match the size of tensor b (1998) at non-singleton dimension 1

infusion-zero-edit commented 8 months ago

Traceback (most recent call last): File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/gradio/queueing.py", line 489, in call_prediction output = await route_utils.call_process_api( File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/gradio/route_utils.py", line 232, in call_process_api output = await app.get_blocks().process_api( File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/gradio/blocks.py", line 1561, in process_api result = await self.call_function( File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/gradio/blocks.py", line 1179, in call_function prediction = await anyio.to_thread.run_sync( File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2134, in run_sync_in_worker_thread return await future File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 851, in run result = context.run(func, args) File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/gradio/utils.py", line 678, in wrapper response = f(args, kwargs) File "/home/jupyter/audio2photoreal/demo/demo.py", line 220, in audio_to_avatar face_results, pose_results, audio = generate_results(audio, num_repetitions, top_p) File "/home/jupyter/audio2photoreal/demo/demo.py", line 188, in generate_results gradio_model.generate_sequences( File "/home/jupyter/audio2photoreal/demo/demo.py", line 148, in generate_sequences sample = self._run_single_diffusion( File "/home/jupyter/audio2photoreal/demo/demo.py", line 100, in _run_single_diffusion sample = sample_fn( File "/home/jupyter/audio2photoreal/diffusion/gaussian_diffusion.py", line 845, in ddim_sample_loop for sample in self.ddim_sample_loop_progressive( File "/home/jupyter/audio2photoreal/diffusion/gaussian_diffusion.py", line 925, in ddim_sample_loop_progressive out = sample_fn( File "/home/jupyter/audio2photoreal/diffusion/gaussian_diffusion.py", line 683, in ddim_sample out_orig = self.p_mean_variance( File "/home/jupyter/audio2photoreal/diffusion/respace.py", line 105, in p_mean_variance return super().p_mean_variance(self._wrap_model(model), *args, kwargs) File "/home/jupyter/audio2photoreal/diffusion/gaussian_diffusion.py", line 287, in p_mean_variance model_output = model(x, self._scale_timesteps(t), model_kwargs) File "/home/jupyter/audio2photoreal/diffusion/respace.py", line 145, in call return self.model(x, new_ts, *kwargs) File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/jupyter/audio2photoreal/model/cfg_sampler.py", line 35, in forward out = self.model(x, timesteps, y) File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/jupyter/audio2photoreal/model/diffusion.py", line 388, in forward cond_tokens = torch.where( RuntimeError: The size of tensor a (7998) must match the size of tensor b (1998) at non-singleton dimension 1

evonneng commented 8 months ago

Hi thanks for posting this! This actually should be due to the fact that we only support recording up to 20 seconds of audio for now. The conditioning size for 20 seconds will result in a max embedding sequence length of 1998, which unfortunately is hardcoded here: https://github.com/facebookresearch/audio2photoreal/blob/548aeeb2057465045ca1568d65ea059cea633d80/model/diffusion.py#L136 I believe the above error should result if you end up with an audio embedding that is longer than such a sequence. But please let me know if you're still having issues with recording lengths less than 20 seconds.

infusion-zero-edit commented 8 months ago

thanks for replying back, after i posted this i came to know this fixed number by searching over git, but even after uploading 6sec audio, i am getting following error:

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run gradio deploy from Terminal to deploy to Spaces (https://huggingface.co/spaces) 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:25<00:00, 3.98it/s] created 3 samples 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:11<00:00, 9.01it/s] created 3 samples 0%| | 0/120 [00:03<?, ?it/s] Traceback (most recent call last): File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/gradio/queueing.py", line 489, in call_prediction output = await route_utils.call_process_api( File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/gradio/route_utils.py", line 232, in call_process_api output = await app.get_blocks().process_api( File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/gradio/blocks.py", line 1561, in process_api result = await self.call_function( File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/gradio/blocks.py", line 1179, in call_function prediction = await anyio.to_thread.run_sync( File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2134, in run_sync_in_worker_thread return await future File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 851, in run result = context.run(func, args) File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/gradio/utils.py", line 678, in wrapper response = f(args, kwargs) File "/home/jupyter/audio2photoreal/demo/demo.py", line 232, in audio_to_avatar gradio_model.body_renderer.render_full_video( File "/home/jupyter/audio2photoreal/visualize/render_codes.py", line 153, in render_full_video self._write_video_stream( File "/home/jupyter/audio2photoreal/visualize/render_codes.py", line 94, in _write_video_stream out = self._render_loop(motion, face) File "/home/jupyter/audio2photoreal/visualize/render_codes.py", line 121, in _render_loop preds = self.model(default_inputs_copy) File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/home/jupyter/audio2photoreal/visualize/ca_body/models/mesh_vae_drivable.py", line 301, in forward enc_preds = self.encode(geom, lbs_motion, face_embs) File "/home/jupyter/audio2photoreal/visualize/ca_body/models/mesh_vae_drivable.py", line 266, in encode face_dec_preds = self.decoder_face(face_embs_hqlp) File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/jupyter/audio2photoreal/visualize/ca_body/nn/face.py", line 80, in forward texout = self.texmod(self.texmod2(encview).view(-1, 256, 4, 4)) File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/torch/nn/modules/container.py", line 217, in forward input = module(input) File "/opt/conda/envs/a2p_env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl result = forward_call(*args, **kwargs) File "/home/jupyter/audio2photoreal/visualize/ca_body/nn/layers.py", line 335, in forward output = thf.conv_transpose2d( RuntimeError: GET was unable to find an engine to execute this computation

infusion-zero-edit commented 8 months ago

all the torch versions and cuda versions which is 11.7 is there as per requirements file

evonneng commented 8 months ago

Glad to hear the other issue is solved! Regarding the GET issue, could you try checking to see if your pytorch and cuda versions are compatible please? specifically, is the torch version compiled with cuda 11.7 and is your system using cuda 11.7? for the torch version, it could be installed with:

pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2

The mismatch seems to be the core of the reason you may be getting this error. Here are some links that are hopefully helpful?

could be because there is a mismatch in cuda version you're running vs. cuda version compiled with torch link
your cuda version is not properly linked link

infusion-zero-edit commented 8 months ago

These issues has been solved i have solved it by reinstalling cuda-11.7. Thanks for the reply just another question if i want to replace the avataar or new person without having any training data for that, how can we do that?

alexanderrichard commented 8 months ago

Glad to hear the issue is resolved. Unfortunately, you can't replace the avatar with a new person without having training data. See my reply here https://github.com/facebookresearch/audio2photoreal/issues/33

facebookresearch / audio2photoreal

RuntimeError: The size of tensor a (7998) must match the size of tensor b (1998) at non-singleton dimension 1 #31