Rudrabha / Lip2Wav

This is the repository containing codes for our CVPR, 2020 paper titled "Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis"
MIT License
699 stars 153 forks source link

Pre-processing and training not working on custom dataset? #25

Open graham-eisele opened 3 years ago

graham-eisele commented 3 years ago

Whenever I preprocess the custom dataset, this is the output:

`C:\Users\Graham\Desktop\Lip2Wav-master>python preprocess.py --speaker_root Dataset/larry --speaker larry C:\Users\Graham\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint8 = np.dtype([("qint8", np.int8, 1)]) C:\Users\Graham\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint8 = np.dtype([("quint8", np.uint8, 1)]) C:\Users\Graham\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint16 = np.dtype([("qint16", np.int16, 1)]) C:\Users\Graham\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint16 = np.dtype([("quint16", np.uint16, 1)]) C:\Users\Graham\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint32 = np.dtype([("qint32", np.int32, 1)]) C:\Users\Graham\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. np_resource = np.dtype([("resource", np.ubyte, 1)]) Started processing for Dataset/larrywith 1 GPUs 0it [00:00, ?it/s]

C:\Users\Graham\Desktop\Lip2Wav-master>`

but there is no new output, and when I try and train, this outputs:

`C:\Users\Graham\Desktop\Lip2Wav-master>python train.py first_run --data_root Dataset/larry/ --preset synthesizer/presets/larry.json Arguments: name: first_run data_root: Dataset/larry/ preset: synthesizer/presets/larry.json models_dir: synthesizer/saved_models/ mode: synthesis GTA: True restore: True summary_interval: 2500 embedding_interval: 1000000000 checkpoint_interval: 1000 eval_interval: 1000 tacotron_train_steps: 2000000 tf_log_level: 1

Traceback (most recent call last): File "train.py", line 61, in log_dir, hparams = prepare_run(args) File "train.py", line 21, in prepare_run hparams.add_hparam('all_images', all_images) File "C:\Users\Graham\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\contrib\training\python\training\hparam.py", line 485, in add_hparam 'Multi-valued hyperparameters cannot be empty: %s' % name) ValueError: Multi-valued hyperparameters cannot be empty: all_images

C:\Users\Graham\Desktop\Lip2Wav-master>`

How do you properly use a custom dataset with this project? Thank you.

prajwalkr commented 3 years ago

ValueError: Multi-valued hyperparameters cannot be empty: all_images

Ensure that this line is returning the list of all face images. If not, your path is incorrect.

graham-eisele commented 3 years ago

It is not returning the list of all face images. I traced it back to get_image_list in hparams.py, and that this part filelist.extend(list(glob(os.path.join(data_root, 'preprocessed', vid_id, '*/*.jpg')))) is causing the issue, because the preprocess.py script appears to not output anything.

graham-eisele commented 3 years ago

Even when using any of the datasets use for the paper, I still get this error after following all instructions on the reademe.

Halle-Astra commented 3 years ago

Even when using any of the datasets use for the paper, I still get this error after following all instructions on the reademe.

Yes, I got this error too. And I find the solution for it right now .

You need to revise the code in preprocess.py as follow : Before : fulldir = vfile.replace('/intervals/', '/preprocessed/') After fulldir = vfile.replace('intervals', 'preprocessed')

You only need to revise the line 65, 97 (two lines only ).

This problem is caused by the line 124 . My enviroment is Windows10 , but the enviroment of author should be Linux . The delimiter of path is different so vfile.replace is not expected .

You also needn't to run preprocess.py again to generate images. It's too long I know. So you can run my code which in my repo https://github.com/Halle-Astra/lip2wav_revised by python preprocess_win_mv.py <name>.

I hope this solution can help you to solve it .

graham-eisele commented 3 years ago

Even when using any of the datasets use for the paper, I still get this error after following all instructions on the reademe.

Yes, I got this error too. And I find the solution for it right now .

You need to revise the code in preprocess.py as follow : Before : fulldir = vfile.replace('/intervals/', '/preprocessed/') After fulldir = vfile.replace('intervals', 'preprocessed')

You only need to revise the line 65, 97 (two lines only ).

This problem is caused by the line 124 . My enviroment is Windows10 , but the enviroment of author should be Linux . The delimiter of path is different so vfile.replace is not expected .

You also needn't to run preprocess.py again to generate images. It's too long I know. So you can run my code which in my repo https://github.com/Halle-Astra/lip2wav_revised by python preprocess_win_mv.py <name>.

I hope this solution can help you to solve it .

That seems to work, but then I get the same error with train.py.

graham-eisele commented 3 years ago

Changing

def get_image_list(split, data_root): filelist = [] with open(os.path.join(data_root, '{}.txt'.format(split))) as vidlist: for vid_id in vidlist: vid_id = vid_id.strip() filelist.extend(list(glob(os.path.join(data_root, 'preprocessed', vid_id, '*/*.jpg')))) return filelist

to

`def get_image_list(split, data_root): trainList = [] valList = []

if split == "train":
    with open(os.path.join(data_root, '{}.txt'.format(split))) as vidlist:
        for vid_id in vidlist:
            vid_id = vid_id.strip()
            trainList.extend(list(glob(os.path.join(data_root, 'preprocessed', vid_id, '*/*.jpg'))))

if split == "val":
    with open(os.path.join(data_root, '{}.txt'.format(split))) as vidlist:
        for vid_id in vidlist:
            vid_id = vid_id.strip()
            valList.extend(list(glob(os.path.join(data_root, 'preprocessed', vid_id, '*/*.jpg'))))

if split == "train":
    #print(trainList)
    return trainList

if split == "val":
    #print(valList)
    return valList`

lets me train, but when I start training it says only .29 hours of training data when there are over 11 hours of training data. Also when I run complete_test_generate.py I get 0.0 hours is available for testing 0it [00:00, ?it/s].

prajwalkr commented 3 years ago

The number of hours is calculated based on (total number of images / fps) / 3600. Ensure all the face images are detected by the script.

graham-eisele commented 3 years ago

I fixed it now, and it says the correct hours and runs, but I don't see and .wavs being output?

graham-eisele commented 3 years ago

it says this

np_resource = np.dtype([("resource", np.ubyte, 1)]) 2.2698796296296297 hours is available for testing 0it [00:00, ?it/s] | 0/1 [00:00<?, ?it/s] 100%|███████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 999.36it/s]

prajwalkr commented 3 years ago

Does your test set only contain a single video? Please check if any exceptions are being silently ignored in the code

graham-eisele commented 3 years ago

I tried with multiple videos and one video and it output the same thing.

graham-eisele commented 3 years ago

Would this be the correct path to checkpoint? C:\Users\Graham\Desktop\Lip2Wav-master\synthesizer\saved_models\logs-01\taco_pretrained\tacotron_model.ckpt-5000.data-00000-of-00001

graham-eisele commented 3 years ago

Is there a limit to the video length when using complete_test_generate.py?

prajwalkr commented 3 years ago

there is a minimum frame limit = hparams.T. Please add print statements in different places and see if your videos are being skipped.

C:\Users\Graham\Desktop\Lip2Wav-master\synthesizer\saved_models\logs-01\taco_pretrained\tacotron_model.ckpt-5000.data-00000-of-00001

yes.

graham-eisele commented 3 years ago

I still haven't been able to solve or find that issue, but now when I try to resume training from checkpoint, I get this error:

tensorflow.python.framework.errors_impl.NotFoundError: FindFirstFile failed for: synthesizer/saved_models/logs-final/taco_pretrained : The system cannot find the path specified. ; No such process

after running this command

python train.py 01 --data_root Dataset/chem/ --preset synthesizer/presets/chem.json

graham-eisele commented 3 years ago

Whenever I print vidpath from complete_test_generate.py, the output is just Dataset/, should it not be Dataset/chem/preprocessed/0010298\0010298? or is that correct. Also when printing videos is retrieved from get_testlist(args.data_root) the output is ['Dataset'].