Closed DatGuy1 closed 4 years ago
Give me a line number or screenshot?
I linked the line in the commit. Screenshot
I don't get any line highlighted or scrolled to when I use that link.
I don't get any line highlighted or scrolled to when I use that link.
I think it's because train.py is hidden by default because it was a large diff.
@CookiePPP another error. Not sure the cause but on any validation (currently using LJSpeech):
Traceback (most recent call last): File "train.py", line 905, in
train(args, args.rank, args.group_name, hparams) File "train.py", line 773, in train valattloss, * = validate(hparams, args, file_losses, model, criterion, valset, best_val_loss_dict, iteration, collate_fn, logger, 0, 0.0, teacher_force=2)# infer File "train.py", line 432, in validate loss_dict_total = {k: v/(i+1) for k, v in loss_dict_total.items()} AttributeError: 'NoneType' object has no attribute 'items'
@DatGuy1 Can you check your validation file(s)? This error would only occur if your validation set was smaller than your batch size (which is fucking unlikely under normal conditions).
val_batch_size is at default of 32. There's 610 validation files and they're all checked. Full output here
Alright, 2 things. https://github.com/CookiePPP/cookietts/blob/a48899296b840b5f053f52e7573a9664a880c993/CookieTTS/_2_ttm/tacotron2_tm/hparams.py#L72-L84
Which data_source
'mode' are you using?
Did you add speaker ids to any of the datasets you're testing? This repo isn't tested with single speaker datasets though I wouldn't have expected any failures around the validation area due to wonky/missing IDs.
data_source is 0. I'm testing with a single speaker and LJSpeech just to make myself familiar with it before moving onto anything serious. It must've been one of the recent commits, possibly overflow related, since I could train it fine before.
From my testing: valset is fine with 610 files just before entering the for loop. Length of val_loader is 19. The only modification I made that I can think of is disabling all instances of distribution, i.e. num_workers = 0, distributed_run = False, etc.
https://github.com/CookiePPP/cookietts/blob/experimental/CookieTTS/_2_ttm/tacotron2_tm/train.py#L396
I added this line a little ago. It will change the 2nd pass of validation to sample from each speaker equally, so the inference plots on tensorboard don't massively overweight speakers with more data. I think that's failing when using single-speaker datasets, though I don't see the exact line inside the function that's messing up. I'll add a hparam you can flip in a sec.
https://github.com/CookiePPP/cookietts/commit/726249e212b530ca64b7c7b59cd6f0bf59f8a2d2
inference_equally_sample_speakers=True,# Will change the 'inference' results to use the same number of files from each speaker.
# This makes sense if the speakers you want to clone aren't the same as the speakers with the most audio data.
Yep, that's the one. Works now.
If first training iteration has gradient overflow (and is skipped), due to this change an UnboundLocalError: local variable 'average_loss' referenced before assignment is thrown.