Error trying to launch train_rave.py

pgm-n117 commented 2 years ago

Hi, I was trying to launch train_rave.py with a dataset for testing. I am using 310 .wav files with the cli_helper.py, which returned me an error like this:

$ python train_rave.py --name training1 --wav ./dataset/1 --preprocessed /tmp/rave/training1/rave ... 5_38_26.wav: 99%|███████████████████████████▊| 308/310 [00:46<00:00, 6.69it/s]/home/pablo/.local/lib/python3.9/site-packages/librosa/core/audio.py:165: UserWarning: PySoundFile failed. Trying audioread instead. warnings.warn("PySoundFile failed. Trying audioread instead.")

2_38_13.wav: 100%|███████████████████████████▉| 309/310 [00:46<00:00, 6.60it/s]/home/pablo/.local/lib/python3.9/site-packages/librosa/core/audio.py:165: UserWarning: PySoundFile failed. Trying audioread instead. warnings.warn("PySoundFile failed. Trying audioread instead.")

2_38_13.wav: 100%|████████████████████████████| 310/310 [00:46<00:00, 6.61it/s] Traceback (most recent call last): File "/home/pablo/RAVE/train_rave.py", line 77, in dataset = SimpleDataset( File "/home/pablo/.local/lib/python3.9/site-packages/udls/simple_dataset.py", line 83, in init raise Exception("No data found !") Exception: No data found !

Have you seen this error before? I am trying to use it with a GPU, but launching only with CPU throws the same error. Maybe there is a problem with the dataset?

Thanks!

caillonantoine commented 2 years ago

Yup, this is a problem with your dataset, check that there is no problem with it !

eleGAN23 commented 2 years ago

Hi, I have the same error, the dataset was downloaded here. Have you solved it? Could you please provide any suggestions on how to build the dataset directory or whether the code needs a manual list of folders containing the .wav files?

Thanks in advance.

caillonantoine commented 2 years ago

This error means that the lmdb database is empty, i.e no audio has been preprocessed and loaded into it ! You can try and use the resample utility provided with RAVE (you might want to re-run pip install -r requirements.txt though)

pgm-n117 commented 2 years ago

I could pass the original issue with the dataset, but now i'm stuck with this error. (Updated repo to last commit):

GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name          | Type                | Params
------------------------------------------------------
0 | pqmf          | CachedPQMF          | 16.7 K
1 | loudness      | Loudness            | 0     
2 | encoder       | Encoder             | 4.8 M 
3 | decoder       | Generator           | 12.8 M
4 | discriminator | StackDiscriminators | 16.9 M
------------------------------------------------------
34.6 M    Trainable params
0         Non-trainable params
34.6 M    Total params
138.202   Total estimated model params size (MB)

Validation sanity check: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  4.03it/s]Traceback (most recent call last):
  File "/home/pablo/RAVE/train_rave.py", line 154, in <module>
    trainer.fit(model, train, val)
  File "/home/pablo/.local/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 552, in fit
    self._run(model)
  File "/home/pablo/.local/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 922, in _run
    self._dispatch()
  File "/home/pablo/.local/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 990, in _dispatch
    self.accelerator.start_training(self)
  File "/home/pablo/.local/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 92, in start_training
    self.training_type_plugin.start_training(trainer)
  File "/home/pablo/.local/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 161, in start_training
    self._results = trainer.run_stage()
  File "/home/pablo/.local/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1000, in run_stage
    return self._run_train()
  File "/home/pablo/.local/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1035, in _run_train
    self._run_sanity_check(self.lightning_module)
  File "/home/pablo/.local/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1122, in _run_sanity_check
    self._evaluation_loop.run()
  File "/home/pablo/.local/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 118, in run
    output = self.on_run_end()
  File "/home/pablo/.local/lib/python3.9/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 133, in on_run_end
    self.evaluation_epoch_end(outputs)
  File "/home/pablo/.local/lib/python3.9/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 243, in evaluation_epoch_end
    model.validation_epoch_end(outputs)
  File "/home/pablo/RAVE/rave/model.py", line 661, in validation_epoch_end
    pca = PCA(z.shape[-1]).fit(z.cpu().numpy())
  File "/home/pablo/.local/lib/python3.9/site-packages/sklearn/decomposition/_pca.py", line 382, in fit
    self._fit(X)
  File "/home/pablo/.local/lib/python3.9/site-packages/sklearn/decomposition/_pca.py", line 457, in _fit
    return self._fit_full(X, n_components)
  File "/home/pablo/.local/lib/python3.9/site-packages/sklearn/decomposition/_pca.py", line 475, in _fit_full
    raise ValueError(
ValueError: n_components=128 must be between 0 and min(n_samples, n_features)=32 with svd_solver='full'

any idea on this?

caillonantoine commented 2 years ago

How large is your dataset ?

pgm-n117 commented 2 years ago

I used 28 wav files, 25MB in total. Its not a large dataset, but i used it for testing

caillonantoine commented 2 years ago

I'm not sure how you've solved your previous problem, but your dataset is so small that I think the validation set is still empty !

pgm-n117 commented 2 years ago

I'll try a larger one then. Is there a minimum size? I only have an RTX 2060 in hand for trainings

caillonantoine commented 2 years ago

No matter what GPU you have, a larger dataset will always produce better results. It really isn't the number of epoch that counts in this case, but rather the number of training steps.

pgm-n117 commented 2 years ago

I know i'll get better results with a larger dataset, but at this time I dont have access to better hardware, and wanted to try the training on mine, and of course with lower end GPUs and lower memory, it will take longer. Anyway, I'll keep trying with a larger one and hope it works. Thank you!

iamzoltan commented 2 years ago

@caillonantoine thanks for the help and sorry about that lingering issue. I figured it out :). I wasn't using a big enough sample size for the model (I think this is a similar issue here). Anyway, I am currently training two models with different capacities. The smaller one (the default) is at around 850K steps, and the larger one just about 250K steps. I heard stopping around one million is a good heuristic, although I am tempted to let it run until the reconstructions have minimal distortion. Can you share any insights?

Thanks for your time.

caillonantoine commented 2 years ago

What sample size where you using ? The default is 2^16, which is approximately 1.5s at 44.1kHz ! You should look at the distance loss in tensorboard, when it starts to plateau, you should either halve the learning rate or switch to phase 2 ! :)

iamzoltan commented 2 years ago

I was using the default. I did not see a learning rate option in the train_rave.py script, where do I change this? also, what is phase 2 exactly? Is that exporting rave and training the prior?

caillonantoine commented 2 years ago

You're gonna have to do it manually in side rave/model.py. For any question on the model itself I suggest that you read the article https://arxiv.org/abs/2111.05011

iamzoltan commented 2 years ago

Thanks for all the help! will give that a try :). p.s the paper is super helpful.

acids-ircam / RAVE

Error trying to launch train_rave.py #27