[Speaker Recognition and Verification] got runtime error on training

triumph9989 commented 1 year ago

I tried to learn training speaker embedding extractor from https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/starthere/tutorials.html Domain: ASR title: Speaker Recognition and Verification It's no problem before cell [20] cell [20]: trainer.fit(speaker_model) bug:

INFO:pytorch_lightning.accelerators.gpu:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[NeMo I 2022-09-07 02:27:14 modelPT:587] Optimizer config = SGD (
    Parameter Group 0
        dampening: 0
        foreach: None
        lr: 0.006
        maximize: False
        momentum: 0
        nesterov: False
        weight_decay: 0.001
    )
[NeMo I 2022-09-07 02:27:14 lr_scheduler:914] Scheduler "<nemo.core.optim.lr_scheduler.CosineAnnealing object at 0x7f682c63a690>" 
    will be used during training (effective maximum steps = 140) - 
    Parameters : 
    (warmup_ratio: 0.1
    min_lr: 0.0
    max_steps: 140
    )
INFO:pytorch_lightning.callbacks.model_summary:
  | Name              | Type                              | Params
------------------------------------------------------------------------
0 | preprocessor      | AudioToMelSpectrogramPreprocessor | 0     
1 | encoder           | ConvASREncoder                    | 19.4 M
2 | decoder           | SpeakerDecoder                    | 2.8 M 
3 | loss              | AngularSoftmaxLoss                | 0     
4 | _accuracy         | TopKClassificationAccuracy        | 0     
5 | spec_augmentation | SpectrogramAugmentation           | 0     
------------------------------------------------------------------------
22.1 M    Trainable params
0         Non-trainable params
22.1 M    Total params
88.546    Total estimated model params size (MB)
[NeMo I 2022-09-07 02:27:15 label_models:312] val_loss: 13.214
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-22-f8bb6804e159>](https://localhost:8080/#) in <module>
----> 1 trainer.fit(speaker_model)

14 frames
[/usr/local/lib/python3.7/dist-packages/nemo/utils/timers.py](https://localhost:8080/#) in start(self, name)
     90 
     91         if "start" in timer_data:
---> 92             raise RuntimeError(f"Cannot start timer = '{name}' since it is already active")
     93 
     94         # synchronize pytorch cuda execution if supported

RuntimeError: Cannot start timer = 'train_step_timing' since it is already active
Epoch 0: 0%
0/15 [00:00<?, ?it/s]

Additional context Before the above Runtime error, I got TypeError: unhashable type: 'list' from /usr/local/lib/python3.7/dist-packages/numba/cuda/dispatcher.py. But I fixed it.

SeanNaren commented 1 year ago

Thanks for the report! I was unable to reproduce this on my machine (I did run into the unhashable type error, but updating Numba fixed that).

cc @nithinraok who may have run into this before!

nithinraok commented 1 year ago

Haven't seen this before, running notebook now doesn't throw error on colab. @triumph9989 have you run this on colab or locally?

triumph9989 commented 1 year ago

@nithinraok Thanks for your concern about this issue. I was running this on colab.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 7 days since being marked as stale.

NVIDIA / NeMo

[Speaker Recognition and Verification] got runtime error on training #4888