ashawkey / RAD-NeRF

Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition
MIT License
878 stars 153 forks source link

using another audio feature extraction #21

Open pegahs1993 opened 1 year ago

pegahs1993 commented 1 year ago

During testing, I plan to use another audio feature extraction with a different shape (x, 16, 80). But it is incompatible with the convolution model.

RuntimeError: Given groups=1, weight of size [32, 44, 3], expected input[8, 80, 16] to have 44 channels, but got 80 channels instead

I change self.audio_in_dim in the directory./nerf/network.py but it can not resolve the problem!

if 'esperanto' in self.opt.asr_model:
      self.audio_in_dim = 44

Is it possible to guide me which part must I change?

ashawkey commented 1 year ago

@tylersky1993 You could just fix audio_in_dim to 80 and remove the if condition? (assuming your asr_model's name doesn't contain 'esperanto').

pegahs1993 commented 1 year ago

Thank you very much for responding so quickly I did that. but I have a same error again.

RuntimeError: Error(s) in loading state_dict for NeRFNetwork:
    size mismatch for audio_net.encoder_conv.0.weight: copying a param with shape torch.Size([32, 44, 3]) from checkpoint, the shape in current model is torch.Size([32, 80, 3]).
ashawkey commented 1 year ago

You'll have to train from scratch, instead of loading a pretrained model. You could delete the workspace and try again.

pegahs1993 commented 1 year ago

Both wav2vecand deepspeechmethods are used in the files required for training. But only the wav2vecmethod is used during the test. What is the reason behind this?

ashawkey commented 1 year ago

You can also use deepspeech in testing? Just specify --asr_model deepspeech and use the corresponding audio features.

pegahs1993 commented 1 year ago

Thanks a lot @ashawkey !

pegahs1993 commented 1 year ago

If deepspeechis to be used in tests, what changes need to be made?

I change the default name and use the corresponding audio features. But it did not work!

parser.add_argument('--asr_model', type=str, default='deepspeech ')

It is written in Readme : if model is <ID>.pth, it uses deepspeech features

I use obama.pth but I have a this error: TypeError: object of type 'NoneType' has no len()

ashawkey commented 1 year ago

Could you provide the full error log?

pegahs1993 commented 1 year ago

Could you provide the full error log?

Traceback (most recent call last): File "C:\Users---\anaconda3\envs\rad-nerf\lib\argparse.py", line 2484, in _get_value result = type_func(arg_string) ValueError: invalid literal for int() with base 10: '{Pose_start}'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users---\anaconda3\envs\rad-nerf\lib\argparse.py", line 1859, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "C:\Users---\anaconda3\envs\rad-nerf\lib\argparse.py", line 2068, in _parse_known_args start_index = consume_optional(start_index) File "C:\Users---\anaconda3\envs\rad-nerf\lib\argparse.py", line 2008, in consume_optional take_action(action, args, option_string) File "C:\Users---\anaconda3\envs\rad-nerf\lib\argparse.py", line 1920, in take_action argument_values = self._get_values(action, argument_strings) File "C:\Users---\anaconda3\envs\rad-nerf\lib\argparse.py", line 2469, in _get_values value = [self._get_value(action, v) for v in arg_strings] File "C:\Users---\anaconda3\envs\rad-nerf\lib\argparse.py", line 2469, in value = [self._get_value(action, v) for v in arg_strings] File "C:\Users---\anaconda3\envs\rad-nerf\lib\argparse.py", line 2497, in _get_value raise ArgumentError(action, msg % args) argparse.ArgumentError: argument --data_range: invalid int value: '{Pose_start}'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users---\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\interactiveshell.py", line 2777, in safe_execfile py3compat.execfile( File "C:\Users---\anaconda3\envs\rad-nerf\lib\site-packages\IPython\utils\py3compat.py", line 168, in execfile exec(compiler(f.read(), fname, 'exec'), glob, loc) File "D:\PhD\Imp\RAD-NeRF\test.py", line 110, in opt = parser.parse_args() File "C:\Users---\anaconda3\envs\rad-nerf\lib\argparse.py", line 1826, in parse_args args, argv = self.parse_known_args(args, namespace) File "C:\Users---\anaconda3\envs\rad-nerf\lib\argparse.py", line 1862, in parse_knownargs self.error(str(err)) File "C:\Users---\anaconda3\envs\rad-nerf\lib\argparse.py", line 2583, in error self.exit(2, ('%(prog)s: error: %(message)s\n') % args) File "C:\Users---\anaconda3\envs\rad-nerf\lib\argparse.py", line 2570, in exit _sys.exit(status) SystemExit: 2

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users---\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\ultratb.py", line 1101, in get_records return _fixed_getinnerframes(etb, number_of_lines_of_context, tb_offset) File "C:\Users---\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\ultratb.py", line 248, in wrapped return f(*args, **kwargs) File "C:\Users---\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\ultratb.py", line 281, in _fixed_getinnerframes records = fix_frame_records_filenames(inspect.getinnerframes(etb, context)) File "C:\Users---\anaconda3\envs\rad-nerf\lib\inspect.py", line 1670, in getinnerframes frameinfo = (tb.tb_frame,) + getframeinfo(tb, context) AttributeError: 'tuple' object has no attribute 'tb_frame'

ValueError Traceback (most recent call last) ~\anaconda3\envs\rad-nerf\lib\argparse.py in _get_value(self, action, arg_string) 2483 try: -> 2484 result = type_func(arg_string) 2485

ValueError: invalid literal for int() with base 10: '{Pose_start}'

During handling of the above exception, another exception occurred:

ArgumentError Traceback (most recent call last) ~\anaconda3\envs\rad-nerf\lib\argparse.py in parse_known_args(self, args, namespace) 1858 try: -> 1859 namespace, args = self._parse_known_args(args, namespace) 1860 except ArgumentError:

~\anaconda3\envs\rad-nerf\lib\argparse.py in _parse_known_args(self, arg_strings, namespace) 2067 # consume the next optional and any arguments for it -> 2068 start_index = consume_optional(start_index) 2069

~\anaconda3\envs\rad-nerf\lib\argparse.py in consume_optional(start_index) 2007 for action, args, option_string in action_tuples: -> 2008 take_action(action, args, option_string) 2009 return stop

~\anaconda3\envs\rad-nerf\lib\argparse.py in take_action(action, argument_strings, option_string) 1919 seen_actions.add(action) -> 1920 argument_values = self._get_values(action, argument_strings) 1921

~\anaconda3\envs\rad-nerf\lib\argparse.py in _get_values(self, action, arg_strings) 2468 else: -> 2469 value = [self._get_value(action, v) for v in arg_strings] 2470 for v in value:

~\anaconda3\envs\rad-nerf\lib\argparse.py in (.0) 2468 else: -> 2469 value = [self._get_value(action, v) for v in arg_strings] 2470 for v in value:

~\anaconda3\envs\rad-nerf\lib\argparse.py in _get_value(self, action, argstring) 2496 msg = ('invalid %(type)s value: %(value)r') -> 2497 raise ArgumentError(action, msg % args) 2498

ArgumentError: argument --data_range: invalid int value: '{Pose_start}'

During handling of the above exception, another exception occurred:

SystemExit Traceback (most recent call last) ~\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\interactiveshell.py in safe_execfile(self, fname, exit_ignore, raise_exceptions, shell_futures, *where) 2776 glob, loc = (where + (None, ))[:2] -> 2777 py3compat.execfile( 2778 fname, glob, loc,

~\anaconda3\envs\rad-nerf\lib\site-packages\IPython\utils\py3compat.py in execfile(fname, glob, loc, compiler) 167 compiler = compiler or compile --> 168 exec(compiler(f.read(), fname, 'exec'), glob, loc) 169

D:\PhD\Imp\RAD-NeRF\test.py in 109 --> 110 opt = parser.parse_args() 111

~\anaconda3\envs\rad-nerf\lib\argparse.py in parse_args(self, args, namespace) 1825 def parse_args(self, args=None, namespace=None): -> 1826 args, argv = self.parse_known_args(args, namespace) 1827 if argv:

~\anaconda3\envs\rad-nerf\lib\argparse.py in parse_known_args(self, args, namespace) 1861 err = _sys.exc_info()[1] -> 1862 self.error(str(err)) 1863 else:

~\anaconda3\envs\rad-nerf\lib\argparse.py in error(self, message) 2582 args = {'prog': self.prog, 'message': message} -> 2583 self.exit(2, _('%(prog)s: error: %(message)s\n') % args)

~\anaconda3\envs\rad-nerf\lib\argparse.py in exit(self, status, message) 2569 self._print_message(message, _sys.stderr) -> 2570 _sys.exit(status) 2571

SystemExit: 2

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last) ~\AppData\Local\Temp\ipykernel_10432\2001812536.py in 3 4 #@title Run Inference ----> 5 get_ipython().run_line_magic('run', 'test.py -O --torso --pose data/pose.json --data_range {Pose_start} {Pose_end} --ckpt pretrained/model.pth --aud data/sff.npy --bg_img data/{BG} --workspace trial') 6 7 Video = get_latest_file(os.path.join('trial', 'results', '*.mp4'))

~\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\interactiveshell.py in run_line_magic(self, magic_name, line, _stack_depth) 2362 kwargs['local_ns'] = self.get_local_scope(stack_depth) 2363 with self.builtin_trap: -> 2364 result = fn(*args, **kwargs) 2365 return result 2366

~\anaconda3\envs\rad-nerf\lib\site-packages\decorator.py in fun(*args, *kw) 230 if not kwsyntax: 231 args, kw = fix(args, kw, sig) --> 232 return caller(func, (extras + args), **kw) 233 fun.name = func.name 234 fun.doc = func.doc

~\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\magic.py in (f, *a, k) 185 # but it's overkill for just that one bit of state. 186 def magic_deco(arg): --> 187 call = lambda f, *a, *k: f(a, k) 188 189 if callable(arg):

~\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\magics\execution.py in run(self, parameter_s, runner, file_finder) 845 else: 846 # regular execution --> 847 run() 848 849 if 'i' in opts:

~\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\magics\execution.py in run() 830 831 def run(): --> 832 runner(filename, prog_ns, prog_ns, 833 exit_ignore=exit_ignore) 834

~\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\interactiveshell.py in safe_execfile(self, fname, exit_ignore, raise_exceptions, shell_futures, *where) 2792 raise 2793 if not exit_ignore: -> 2794 self.showtraceback(exception_only=True) 2795 except: 2796 if raise_exceptions:

~\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\interactiveshell.py in showtraceback(self, exc_tuple, filename, tb_offset, exception_only, running_compiled_code) 2068 stb = ['An exception has occurred, use %tb to see ' 2069 'the full traceback.\n'] -> 2070 stb.extend(self.InteractiveTB.get_exception_only(etype, 2071 value)) 2072 else:

~\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\ultratb.py in get_exception_only(self, etype, value) 752 value : exception value 753 """ --> 754 return ListTB.structured_traceback(self, etype, value) 755 756 def show_exception_only(self, etype, evalue):

~\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\ultratb.py in structured_traceback(self, etype, evalue, etb, tb_offset, context) 627 chained_exceptions_tb_offset = 0 628 out_list = ( --> 629 self.structured_traceback( 630 etype, evalue, (etb, chained_exc_ids), 631 chained_exceptions_tb_offset, context)

~\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\ultratb.py in structured_traceback(self, etype, value, tb, tb_offset, number_of_lines_of_context) 1365 else: 1366 self.tb = tb -> 1367 return FormattedTB.structured_traceback( 1368 self, etype, value, tb, tb_offset, number_of_lines_of_context) 1369

~\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\ultratb.py in structured_traceback(self, etype, value, tb, tb_offset, number_of_lines_of_context) 1265 if mode in self.verbose_modes: 1266 # Verbose modes need a full traceback -> 1267 return VerboseTB.structured_traceback( 1268 self, etype, value, tb, tb_offset, number_of_lines_of_context 1269 )

~\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\ultratb.py in structured_traceback(self, etype, evalue, etb, tb_offset, number_of_lines_of_context) 1122 """Return a nice text document describing the traceback.""" 1123 -> 1124 formatted_exception = self.format_exception_as_a_whole(etype, evalue, etb, number_of_lines_of_context, 1125 tb_offset) 1126

~\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\ultratb.py in format_exception_as_a_whole(self, etype, evalue, etb, number_of_lines_of_context, tb_offset) 1080 1081 -> 1082 last_unique, recursion_repeat = find_recursion(orig_etype, evalue, records) 1083 1084 frames = self.format_records(records, last_unique, recursion_repeat)

~\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\ultratb.py in find_recursion(etype, value, records) 380 # first frame (from in to out) that looks different. 381 if not is_recursion_error(etype, value, records): --> 382 return len(records), 0 383 384 # Select filename, lineno, func_name to track frames with

TypeError: object of type 'NoneType' has no len()

ashawkey commented 1 year ago

It says argument --data_range: invalid int value: '{Pose_start}', what's the command line you are running?

pegahs1993 commented 1 year ago

It says argument --data_range: invalid int value: '{Pose_start}', what's the command line you are running?

%run test.py -O --torso \
    --pose data/pose.json \
    --data_range {Pose_start} {Pose_end} \
    --ckpt pretrained/model.pth \
    --aud data/speech.npy \
    --bg_img data/{BG} \
    --workspace trial
ashawkey commented 1 year ago

Oh, you should first define Pose_start and other {} variables. I guess you get this snippet from colab? You need to check those definitions, or use the full cmd from readme.

pegahs1993 commented 1 year ago

Hi @ashawkey , Does the model.pthrelate to deepspeech and wav2vec generated during training? According to my assumptions, the file named ngp is for deepspeech. A question. Don't we need to change the default --asr_model (main.py) to train each of them?

parser.add_argument('--asr_model', type=str, default='cpierse/wav2vec2-large-xlsr-53-esperanto')

ashawkey commented 1 year ago

No, they are downloaded automatically (from github or hugging face). Usually the esperanto model works well for most languages.

pegahs1993 commented 1 year ago

Thanks a lot @ashawkey for your response!

The question is whether the trained model will be generated for both (wav2wec and deepspeech) during training, or should it be trained separately for each?

ashawkey commented 1 year ago

It should be trained seperately. e.g., the model trained on wav2vec can only use wav2vec features.