Closed MarcisTU closed 1 month ago
@KunalDhawan pls have a look at this issue.
Hi @MarcisTU, Thank you for the detailed description! Could you please help me reproduce the issue?
I tried to replicate the issue on my end but I am able to run the code snippet you shared above without any errors. Let me describe my replication setup in detail:
Environment: I built a fresh conda env using NeMo main
Reproducing the code:
>>> import copy
>>> import torch
>>> import librosa
>>> from omegaconf import OmegaConf, open_dict
>>> from nemo.collections.asr.models import EncDecHybridRNNTCTCBPEModel
>>> asr_model_path = "/models/stt_de_fastconformer_hybrid_large_pc.nemo"
>>> asr_model = EncDecHybridRNNTCTCBPEModel.restore_from(restore_path=asr_model_path)
.....
[NeMo I 2024-08-01 17:36:37 save_restore_connector:275] Model EncDecHybridRNNTCTCBPEModel was successfully restored from /models/stt_de_fastconformer_hybrid_large_pc.nemo.
>>> device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
>>> device
device(type='cuda')
>>> audio_path = "/data/ASR/en/librispeech/wav/test-clean/61-70968-0000.wav"
>>> input_wav, sr = librosa.load(audio_path, sr=16000)
>>> with open_dict(decoding_cfg):
... decoding_cfg.preserve_alignments = True
... decoding_cfg.compute_timestamps = True
... asr_model.change_decoding_strategy(decoding_cfg)
[NeMo I 2024-08-01 17:38:17 rnnt_models:224] Using RNNT Loss : warprnnt_numba
Loss warprnnt_numba_kwargs: {'fastemit_lambda': 0.0, 'clamp': -1.0}
[NeMo I 2024-08-01 17:38:17 hybrid_rnnt_ctc_bpe_models:457] Changed decoding strategy of the RNNT decoder to
model_type: rnnt
strategy: greedy_batch
compute_hypothesis_token_set: false
preserve_alignments: true
confidence_cfg:
preserve_frame_confidence: false
preserve_token_confidence: false
preserve_word_confidence: false
exclude_blank: true
aggregation: min
tdt_include_duration: false
method_cfg:
name: entropy
entropy_type: tsallis
alpha: 0.33
entropy_norm: exp
temperature: DEPRECATED
fused_batch_size: null
compute_timestamps: true
compute_langs: false
word_seperator: ' '
rnnt_timestamp_type: all
greedy:
max_symbols_per_step: 10
preserve_alignments: false
preserve_frame_confidence: false
tdt_include_duration_confidence: false
confidence_method_cfg:
name: entropy
entropy_type: tsallis
alpha: 0.33
entropy_norm: exp
temperature: DEPRECATED
loop_labels: true
use_cuda_graph_decoder: true
max_symbols: 10
beam:
beam_size: 2
search_type: default
score_norm: true
return_best_hypothesis: false
tsd_max_sym_exp_per_step: 50
alsd_max_target_len: 2.0
nsc_max_timesteps_expansion: 1
nsc_prefix_alpha: 1
maes_num_steps: 2
maes_prefix_alpha: 1
maes_expansion_gamma: 2.3
maes_expansion_beta: 2
language_model: null
softmax_temperature: 1.0
preserve_alignments: false
ngram_lm_model: null
ngram_lm_alpha: 0.0
hat_subtract_ilm: false
hat_ilm_weight: 0.0
tsd_max_sym_exp: 50
temperature: 1.0
durations: []
big_blank_durations: []
>>> hypotheses = asr_model.transcribe(input_wav, return_hypotheses=True)
Transcribing: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.45s/it]
>>> hypotheses
([Hypothesis(score=-26.476362228393555, y_sequence=tensor([ 5, 1004, 354, 6, 5, 91, 255, 255, 38, 21, 166, 8,
5, 255, 22, 15, 90, 28, 621, 5, 91, 354, 38, 16,
5, 8, 121, 5, 153, 43, 43, 9, 8, 50, 22, 5,
482, 3, 8, 236, 199, 22, 114, 4, 551, 95, 8, 38,
10, 5, 8, 121, 5, 247, 291, 2], device='cuda:0'), text='Ygan accinfust complant against the wizzertwo identisch boye Curtin und the loft .', dec_out=None, dec_state=(tensor([[[ 1.1207e-03, 2.4281e-03, 3.4973e-08, -7.2756e-01, -3.1125e-05,
-3.0149e-04, 1.0904e-05, -9.7909e-06, -1.3529e-04, 5.1211e-05,
2.7526e-04, -7.5015e-01, -7.4380e-01, 4.6821e-02, 3.1424e-10,
-1.6009e-07, 7.6155e-01, -9.2109e-07, 6.8400e-05, -7.5959e-01,
7.5943e-01, -7.5767e-01, 2.4640e-06, 2.4134e-05, -2.8190e-03,
........
I was able to transcribe with a FastConformer-Hybrid-Transducer-CTC-BPE model without any issues. Could you please share some more details and kindly help me identify where there could be a possible mismatch between my replication and your setup?
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been inactive for 7 days since being marked as stale.
Describe the bug
After following installation instructions for Linux there is a weird bug about torch tensor not being on CPU when converting to numpy.
Steps/Code to reproduce bug
Code to reproduce:
Expected behavior
Inference code runs and it is possible to get the result.
Environment overview (please complete the following information)
Environment details
If NVIDIA docker image is used you don't need to specify these. Otherwise, please provide:
Additional context Also tried to install in WSL2 windows, but got the same bug.