fishaudio / fish-diffusion

An easy to understand TTS / SVS / SVC framework
https://diff.fish.audio
MIT License
662 stars 87 forks source link

Tensor NotImplementedError #86

Closed AWAS666 closed 1 year ago

AWAS666 commented 1 year ago

Getting this error once I try to start training after a basic install and preparing the data (~400 short wav files). I'm using a python venv environment instead of conda but installed everything with poetry.

GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

| Name | Type | Params

0 | model | DiffSinger | 55.1 M 1 | vocoder | NsfHifiGAN | 14.2 M

55.1 M Trainable params 14.2 M Non-trainable params 69.3 M Total params 277.038 Total estimated model params size (MB) Sanity Checking: 0it [00:00, ?it/s]C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\connectors\data_connector.py:430: PossibleUserWarning: The dataloader, val_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 12 which is the number of cpus on this machine) in theDataLoader` init to improve performance. rank_zero_warn( Sanity Checking DataLoader 0: 0%| | 0/2 [00:00<?, ?it/s]Traceback (most recent call last): File "C:\Users\User\Documents\Testing\fishdiffusion\tools\diffusion\train.py", line 98, in trainer.fit(model, train_loader, valid_loader, ckpt_path=args.resume) File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 520, in fit call._call_and_handle_interrupt( File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\call.py", line 44, in _call_and_handle_interrupt return trainer_fn(*args, kwargs) File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 559, in _fit_impl self._run(model, ckpt_path=ckpt_path) File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 935, in _run results = self._run_stage() File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 976, in _run_stage self._run_sanity_check() File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1005, in _run_sanity_check val_loop.run() File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\loops\utilities.py", line 177, in _decorator return loop_run(self, *args, kwargs) File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\loops\evaluation_loop.py", line 115, in run self._evaluation_step(batch, batch_idx, dataloader_idx) File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\loops\evaluation_loop.py", line 375, in _evaluation_step output = call._call_strategy_hook(trainer, hook_name, step_kwargs.values()) File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\call.py", line 288, in _call_strategy_hook output = fn(args, kwargs) File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\strategies\strategy.py", line 378, in validation_step return self.model.validation_step(*args, kwargs) File "C:\Users\User\Documents\Testing\fishdiffusion\fish_diffusion\archs\diffsinger\diffsinger.py", line 276, in validation_step return self._step(batch, batch_idx, mode="valid") File "C:\Users\User\Documents\Testing\fishdiffusion\fish_diffusion\archs\diffsinger\diffsinger.py", line 215, in _step image_mels, wav_reconstruction, wav_prediction = viz_synth_sample( File "C:\Users\User\Documents\Testing\fishdiffusion\fish_diffusion\utils\viz.py", line 54, in viz_synth_sample wav_reconstruction = vocoder.spec2wav(mel_target, pitch) File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "C:\Users\User\Documents\Testing\fishdiffusion\fish_diffusion\modules\vocoders\nsf_hifigan\nsf_hifigan.py", line 81, in spec2wav y = self.model(c, f0).view(-1) File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, **kwargs) File "C:\Users\User\Documents\Testing\fishdiffusion\fish_diffusion\modules\vocoders\nsf_hifigan\models.py", line 408, in forward f0 = F.interpolate( File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\torch\nn\functional.py", line 3982, in interpolate raise NotImplementedError( NotImplementedError: Input Error: Only 3D, 4D and 5D input Tensors supported (got 2D) for the modes: nearest | linear | bilinear | bicubic | trilinear | area | nearest-exact (got linear) wandb: Waiting for W&B process to finish... (failed 1). Press Ctrl-C to abort syncing.

leng-yue commented 1 year ago

Thank you for your feedback. This bug fixed in the latest commit: https://github.com/fishaudio/fish-diffusion/commit/a7aa5184034cc504c5f32d3c9c9edfd49b7bf6de