fgnt / tssep_data

0 stars 1 forks source link

Evaluation on pretrained model #1

Open nikifori opened 2 months ago

nikifori commented 2 months ago

Hi,

While trying to run the evaluation on the pretrained model: https://github.com/fgnt/tssep_data/blob/master/egs/libri_css/README.md#steps-to-evaluate-a-pretrained-model

I got this error on the make tssep_pretrained_eval command:

FileNotFoundError: [Errno 2] No such file or directory: '~/tssep_data/egs/libri_css/data/ivector/simLibriCSS_oracle_ivectors.json'

I have not changed anything into the config files neither on tssep_pretrained_77_62000.yaml which is the base. However, I think simLibriCSS_oracle_ivectors.json is unecessary for evaluation, since for the evaluation only the produced i-vectors are needed (libriCSS_espnet_ivectors.json), and for domain adaptation feature_statistics.pkl is downloaded successfully.

Update (4/9/2024)

just a quick update.

I managed to overcome the previous error by commenting out the following lines in the config.yaml:

and by changing:

The problem now is that I receive CUDA_OUT_OF_MEMORY error:

Run eval: ~/testing_evaluation_tssep_data/tssep_data/egs/libri_css/tssep_pretrained/eval/62000/1
device: 0
Load feature statistics from cache: ~/testing_evaluation_tssep_data/tssep_data/egs/libri_css/tssep_pretrained/eval/62000/1/cache/feature_statistics.pkl
Use prefetch with threads for dataloading
  0%|                                                                                                                                                       | 0/60 [00:01<?, ?it/s]
ERROR - extract_eval - Failed after 0:00:04!
Traceback (most recent calls WITHOUT Sacred internals):
  File "~/testing_evaluation_tssep_data/tssep_data/tssep_data/eval/run.py", line 246, in main
    eeg.eval(eg=eg)
  File "~/testing_evaluation_tssep_data/tssep_data/tssep_data/eval/experiment.py", line 825, in eval
    self.work(
  File "~/testing_evaluation_tssep_data/tssep_data/tssep_data/eval/experiment.py", line 382, in work
    ex['Observation'] = self.wpe(ex['Observation'])
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/testing_evaluation_tssep_data/tssep/tssep/train/enhancer.py", line 336, in __call__
    nara_wpe.torch_wpe.wpe_v6(
  File "~/miniconda3/envs/ivec_train_check/lib/python3.11/site-packages/nara_wpe/torch_wpe.py", line 222, in wpe_v6
    Y_tilde_inverse_power = Y_tilde * inverse_power[..., None, :]
                            ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.60 GiB. GPU 0 has a total capacty of 22.02 GiB of which 14.05 GiB is free. Including non-PyTorch memory, this process has 7.97 GiB memory in use. Of the allocated memory 6.90 GiB is allocated by PyTorch, and 14.47 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

make: *** [Makefile:13: run] Error 1

But there is not any argument about eval batch size or anything relevant to tweak. Have you got any thoughts on that?

Thanks

boeddeker commented 2 months ago

Thanks Konstantinos for reporting this and sorry for the trouble. We talked via email, so here a summary:

Update (Sep. 23):