ebadawy / voice_conversion

MIT License
129 stars 36 forks source link

Evaluate and Transfer Plotting #3

Closed RussellSB closed 3 years ago

RussellSB commented 3 years ago

Overview

Plotting Notes

Train Plot

image

Eval Plot

image

RussellSB commented 3 years ago

Train Discussion

Is it worth mirroring how evaluate.py saves a feats dict of converted output but also in train.py (on its last epoch)? I'm unsure of this. My reservations lie in that if the user wants to get the converted output of one epoch of training batches, they can use evaluate.py on the train set. Only reservation is out.pickle would be saved in out_eval not out_train.

RussellSB commented 3 years ago

Preprocessing Discussion

In evaluate.py for the feats dict, I named the keys with respect to their conversion A2B, B2A. Seems to be a convention in style transfer code to use 'A2B' notation. Should we consider maybe changing the convention of how spkr directories are named in the database? Instead, if we were to name them spkr_A and spkr_B it could be more consistent. And we can iterate over key strings, instead of key int values in preprocessing.py.

Not only would this make things more consistent with "A2B notation" but say if the user were to name directories "spkr_4" or "spkr_7" (corresponding to ids from flickr_audio), it would string sort and still take spkr_4 as A and spkr_7 as B. We can then have n_spkrs controlled by the number of speaker directories in the dataset. Though for now, we'll note that it only works with two for the time being.

So for preprocessing maybe:

  1. Read number spkr dirs in dataset
  2. assert num_dirs == 2, 'Currently only two speakers accepted'

Would also need to reflect such changes in data_proc.py

ebadawy commented 3 years ago

Preprocessing Discussion

In evaluate.py for the feats dict, I named the keys with respect to their conversion A2B, B2A. Seems to be a convention in style transfer code to use 'A2B' notation. Should we consider maybe changing the convention of how spkr directories are named in the database? Instead, if we were to name them spkr_A and spkr_B it could be more consistent. And we can iterate over key strings, instead of key int values in preprocessing.py.

Not only would this make things more consistent with "A2B notation" but say if the user were to name directories "spkr_4" or "spkr_7" (corresponding to ids from flickr_audio), it would string sort and still take spkr_4 as A and spkr_7 as B. We can then have n_spkrs controlled by the number of speaker directories in the dataset. Though for now, we'll note that it only works with two for the time being.

This seems to be a good idea; however, what do you think about having more than two speakers? we will need to use something else other than A B naming convention in that case. I will leave changing the naming to you if you want to do it or not.

So for preprocessing maybe:

  1. Read number spkr dirs in dataset
  2. assert num_dirs == 2, 'Currently only two speakers accepted'

I like this assertion, we need to have it

Would also need to reflect such changes in data_proc.py

ebadawy commented 3 years ago

Train Discussion

Is it worth mirroring how evaluate.py saves a feats dict of converted output but also in train.py (on its last epoch)? I'm unsure of this. My reservations lie in that if the user wants to get the converted output of one epoch of training batches, they can use evaluate.py on the train set. Only reservation is out.pickle would be saved in out_eval not out_train.

having a separated evaluate.py seems to be a bit redundant since it uses a lot of the training code. what it be better to include that in train.py and have additional argument to how frequent the plots are being generated?

RussellSB commented 3 years ago

Train Discussion

Is it worth mirroring how evaluate.py saves a feats dict of converted output but also in train.py (on its last epoch)? I'm unsure of this. My reservations lie in that if the user wants to get the converted output of one epoch of training batches, they can use evaluate.py on the train set. Only reservation is out.pickle would be saved in out_eval not out_train.

having a separated evaluate.py seems to be a bit redundant since it uses a lot of the training code. what it be better to include that in train.py and have additional argument to how frequent the plots are being generated?

I agree that evaluate.py does introduce code repetition but I still think it is useful for concise results from unseen sets of data. Inference.py on the other hand, would be concerned with the full functionality specific to a full audio file.

Evaluate serves as a lightweight version of train.py that isn't concerned with updating weights. And separating it from train.py, rather than integrating its logic with conditionals in there I think makes things more reader-friendly for newcomers to the codebase.

If we were to integrate this to train.py it isn't just plotting that will be reconsidered but we'd need to have an option that runs a pre-trained model on eval mode (so it doesn't update gradients and is more efficient), and also not save the models and limit the epochs. Which again, isn't too hard to do - but I think will mess up clarity in train.py, which other researchers would especially appreciate for making use of/extending from it.

Also I've incorporated a plotting functionality for train.py already. It will output plots to 'src/out_train' whereas evaluate.py outputs plots to 'src/out_eval'

RussellSB commented 3 years ago

Preprocessing Discussion

In evaluate.py for the feats dict, I named the keys with respect to their conversion A2B, B2A. Seems to be a convention in style transfer code to use 'A2B' notation. Should we consider maybe changing the convention of how spkr directories are named in the database? Instead, if we were to name them spkr_A and spkr_B it could be more consistent. And we can iterate over key strings, instead of key int values in preprocessing.py. Not only would this make things more consistent with "A2B notation" but say if the user were to name directories "spkr_4" or "spkr_7" (corresponding to ids from flickr_audio), it would string sort and still take spkr_4 as A and spkr_7 as B. We can then have n_spkrs controlled by the number of speaker directories in the dataset. Though for now, we'll note that it only works with two for the time being.

This seems to be a good idea; however, what do you think about having more than two speakers? we will need to use something else other than A B naming convention in that case. I will leave changing the naming to you if you want to do it or not.

So for preprocessing maybe:

  1. Read number spkr dirs in dataset
  2. assert num_dirs == 2, 'Currently only two speakers accepted'

I like this assertion, we need to have it

Would also need to reflect such changes in data_proc.py

Ok cool re naming conversion and naming them wrt wav dir names Ill keep this in mind, but Ill prioritise it for after test/eval/train split and inference.py. Also added the assert in preprocessing.py now