asteroid-team / asteroid

The PyTorch-based audio source separation toolkit for researchers
https://asteroid-team.github.io/
MIT License
2.21k stars 419 forks source link

KeyError: 'source_2_path' when training ConvTasNet using enh_single mode #694

Open lingzhic opened 5 months ago

lingzhic commented 5 months ago

I am training ConvTasNet on Librimix train-100 dataset. It works fine when I train it using sep_noisy mode, while it prompts such an error when I train it using enh_single mode:

Results from the following experiment will be stored in exp/train_convtasnet_3rd_causal
Stage 2: Training
/O/asteroid/asteroid/models/conv_tasnet.py:89: UserWarning: In causal configuration cumulative layer normalization (cgLN)or channel-wise layer normalization (chanLN)  must be used. Changing cLN to cLN
  warnings.warn(
/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/lightning_fabric/plugins/environments/slurm.py:204: The `srun` command is available on your system but is not used. HINT: If your intention is to run Lightning on SLURM, prepend your python command with `srun` like so: srun python train.py --exp_dir exp/train_convtasnet_3rd_causal - ...
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(limit_train_batches=1.0)` was configured so 100% of the batches per epoch will be used..
You are using a CUDA device ('NVIDIA GeForce RTX 4090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
[W CUDAAllocatorConfig.h:30] Warning: expandable_segments not supported on this platform (function operator())
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name      | Type           | Params
---------------------------------------------
0 | model     | ConvTasNet     | 5.1 M 
1 | loss_func | PITLossWrapper | 0     
---------------------------------------------
5.1 M     Trainable params
0         Non-trainable params
5.1 M     Total params
20.202    Total estimated model params size (MB)
{'data': {'n_src': 2,
          'sample_rate': 8000,
          'segment': 3,
          'task': 'enh_single',
          'train_dir': 'data/wav8k/min/train-100',
          'valid_dir': 'data/wav8k/min/dev'},
 'filterbank': {'kernel_size': 16, 'n_filters': 512, 'stride': 8},
 'main_args': {'exp_dir': 'exp/train_convtasnet_3rd_causal', 'help': None},
 'masknet': {'bn_chan': 128,
             'hid_chan': 512,
             'mask_act': 'relu',
             'n_blocks': 8,
             'n_repeats': 3,
             'skip_chan': 128},
 'optim': {'lr': 0.001, 'optimizer': 'adam', 'weight_decay': 0.0},
 'positional arguments': {},
 'training': {'batch_size': 14,
              'early_stop': True,
              'epochs': 200,
              'half_lr': True,
              'num_workers': 4}}
Drop 0 utterances from 13900 (shorter than 3 seconds)
Drop 0 utterances from 13900 (shorter than 3 seconds)
Sanity Checking: |          | 0/? [00:00<?, ?it/s]Traceback (most recent call last):
  File "O/asteroid/egs/librimix/ConvTasNet/train.py", line 146, in <module>
    main(arg_dic)
  File "O/asteroid/egs/librimix/ConvTasNet/train.py", line 112, in main
    trainer.fit(system)
  File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit
    call._call_and_handle_interrupt(
  File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 43, in _call_and_handle_interrupt
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
  File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 105, in launch
    return function(*args, **kwargs)
  File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 987, in _run
    results = self._run_stage()
  File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1031, in _run_stage
    self._run_sanity_check()
  File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1060, in _run_sanity_check
    val_loop.run()
  File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pytorch_lightning/loops/utilities.py", line 182, in _decorator
    return loop_run(self, *args, **kwargs)
  File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 128, in run
    batch, batch_idx, dataloader_idx = next(data_fetcher)
  File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pytorch_lightning/loops/fetchers.py", line 133, in __next__
    batch = super().__next__()
  File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pytorch_lightning/loops/fetchers.py", line 60, in __next__
    batch = next(self.iterator)
  File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pytorch_lightning/utilities/combined_loader.py", line 341, in __next__
    out = next(self._iterator)
  File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pytorch_lightning/utilities/combined_loader.py", line 142, in __next__
    out = next(self.iterators[0])
  File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in __next__
    data = self._next_data()
  File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1346, in _next_data
    return self._process_data(data)
  File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data
    data.reraise()
  File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/torch/_utils.py", line 722, in reraise
    raise exception
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3805, in get_loc
    return self._engine.get_loc(casted_key)
  File "index.pyx", line 167, in pandas._libs.index.IndexEngine.get_loc
  File "index.pyx", line 196, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 7081, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 7089, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'source_2_path'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "O/asteroid/asteroid/data/librimix_dataset.py", line 106, in __getitem__
    source_path = row[f"source_{i + 1}_path"]
  File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pandas/core/series.py", line 1112, in __getitem__
    return self._get_value(key)
  File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pandas/core/series.py", line 1228, in _get_value
    loc = self.index.get_loc(label)
  File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3812, in get_loc
    raise KeyError(key) from err
KeyError: 'source_2_path'

And here is my run.sh file:

#!/bin/bash

# Exit on error
set -e
set -o pipefail

# If you haven't generated LibriMix start from stage 0
# Main storage directory. You'll need disk space to store LibriSpeech, WHAM noises
# and LibriMix. This is about 500 Gb
storage_dir=O/asteroid/datasets

# After running the recipe a first time, you can run it from stage 3 directly to train new models.

# Path to the python you'll use for the experiment. Defaults to the current python
# You can run ./utils/prepare_python_env.sh to create a suitable python environment, paste the output here.
python_path=python

# Example usage
# ./run.sh --stage 3 --tag my_tag --task sep_noisy --id 0,1

# General
stage=0  # Controls from which stage to start
tag=""  # Controls the directory name associated to the experiment
# You can ask for several GPUs using id (passed to CUDA_VISIBLE_DEVICES)
id=$CUDA_VISIBLE_DEVICES
out_dir=librimix # Controls the directory name associated to the evaluation results inside the experiment directory

# Network config
n_blocks=8      # Number of conv blocks in each repeat
n_repeats=3     # Number of repeats in the Conv-TasNet
mask_act=relu
# Training config
epochs=200
batch_size=14
num_workers=4
half_lr=yes
early_stop=yes
# Optim config
optimizer=adam
lr=0.001
weight_decay=0.
# Data config
sample_rate=8000
mode=min        # max for val_acc, min for val_loss
n_src=2         # Number of voice sources in the speech
segment=3
task=enh_single     # one of 'enh_single', 'enh_both', 'sep_clean', 'sep_noisy'

eval_use_gpu=1
# Need to --compute_wer 1 --eval_mode max to be sure the user knows all the metrics
# are for the all mode.
compute_wer=0
eval_mode=

. utils/parse_options.sh

sr_string=$(($sample_rate/1000))
suffix=wav${sr_string}k/$mode

if [ -z "$eval_mode" ]; then
  eval_mode=$mode
fi

train_dir=data/$suffix/train-100
valid_dir=data/$suffix/dev
test_dir=data/wav${sr_string}k/$eval_mode/test

if [[ $stage -le  0 ]]; then
    echo "Stage 0: Generating Librimix dataset"
    if [ -z "$storage_dir" ]; then
        echo "Need to fill in the storage_dir variable in run.sh to run stage 0. Exiting"
        exit 1
    fi
  . local/generate_librimix.sh --storage_dir $storage_dir --n_src $n_src
fi

if [[ $stage -le  1 ]]; then
    echo "Stage 1: Generating csv files including wav path and duration"
  . local/prepare_data.sh --storage_dir $storage_dir --n_src $n_src
fi

# Generate a random ID for the run if no tag is specified
uuid=$($python_path -c 'import uuid, sys; print(str(uuid.uuid4())[:8])')
if [[ -z ${tag} ]]; then
    tag=${uuid}
fi

expdir=exp/train_convtasnet_${tag}
mkdir -p $expdir && echo $uuid >> $expdir/run_uuid.txt
echo "Results from the following experiment will be stored in $expdir"

if [[ $stage -le 2 ]]; then
  echo "Stage 2: Training"
  mkdir -p logs
  CUDA_VISIBLE_DEVICES=$id $python_path train.py --exp_dir $expdir \
        --n_blocks $n_blocks \
        --n_repeats $n_repeats \
        --mask_act $mask_act \
        --epochs $epochs \
        --batch_size $batch_size \
        --num_workers $num_workers \
        --half_lr $half_lr \
        --early_stop $early_stop \
        --optimizer $optimizer \
        --lr $lr \
        --weight_decay $weight_decay \
        --train_dir $train_dir \
        --valid_dir $valid_dir \
        --sample_rate $sample_rate \
        --n_src $n_src \
        --task $task \
        --segment $segment | tee logs/train_${tag}.log
    cp logs/train_${tag}.log $expdir/train.log

    # Get ready to publish
    mkdir -p $expdir/publish_dir
    echo "librimix/ConvTasNet" > $expdir/publish_dir/recipe_name.txt
fi

if [[ $stage -le 3 ]]; then
    echo "Stage 3 : Evaluation"

    if [[ $compute_wer -eq 1 ]]; then
      if [[ $eval_mode != "max" ]]; then
        echo "Cannot compute WER without max mode. Start again with --stage 2 --compute_wer 1 --eval_mode max"
        exit 1
      fi

    # Install espnet if not instaled
    if ! python -c "import espnet" &> /dev/null; then
        echo 'This recipe requires espnet. Installing requirements.'
        $python_path -m pip install espnet_model_zoo
        $python_path -m pip install jiwer
        $python_path -m pip install tabulate
    fi
  fi

  $python_path eval.py \
    --exp_dir $expdir \
    --test_dir $test_dir \
    --out_dir $out_dir \
    --use_gpu $eval_use_gpu \
    --compute_wer $compute_wer \
    --task $task | tee logs/eval_${tag}.log

    cp logs/eval_${tag}.log $expdir/eval.log
fi

Could you please suggest if there is any issue in the run.sh configuration?

Thanks, Colin

MunbongChoi commented 5 months ago

I got the same error as you when I ran it with DPRNNTasNet on a Librimix dataset, did you ever solve it?

lingzhic commented 5 months ago

I got the same error as you when I ran it with DPRNNTasNet on a Librimix dataset, did you ever solve it?

It seems to be the dataset object issue. Check line 106 in the asteroid/data/librimix_dataset.py and you will see why it happens.

MunbongChoi commented 5 months ago

I've analyzed your code to resolve that issue and found that n_src and task don't match. I'd recommend either changing task or lowering n_src.