NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
11.96k stars 2.49k forks source link

ASR_with_NeMo Pre-trained Model Inference | Pickling Error #3441

Closed sorenjmadsen closed 2 years ago

sorenjmadsen commented 2 years ago

Describe the bug On attempting inference of a recording, I received the following error: PicklingError: Can't pickle <class 'nemo.collections.common.parts.preprocessing.collections.AudioTextEntity'>: attribute lookup AudioTextEntity on nemo.collections.common.parts.preprocessing.collections failed

Steps/Code to reproduce bug

import nemo
import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.EncDecCTCModel.from_pretrained(model_name="QuartzNet15x5Base-En", strict=False)

files = ['./soundsample.wav']
for fname, transcription in zip(files, asr_model.transcribe(paths2audio_files=files)):
  print(f"Audio in {fname} was recognized as: {transcription}")

Environment overview

Environment details

titu1994 commented 2 years ago

Nemo models and datasets are not pickleabl - we recommend using ddp in all cases when training.

However I don't know why it's occuring during inference - are you trying to launch that code on multiple threads ? That will not work. You need different processes independent from each other.

sorenjmadsen commented 2 years ago

The above snippet is all I am trying to run. The pickle error is thrown when I call transcribe().

sorenjmadsen commented 2 years ago

Stack trace:


---------------------------------------------------------------------------
PicklingError                             Traceback (most recent call last)
Input In [4], in <module>
      1 asr_model = nemo_asr.models.EncDecCTCModel.from_pretrained(model_name="QuartzNet15x5Base-En", strict=False)
      3 files = ['./soundsample.wav']
----> 4 for fname, transcription in zip(files, asr_model.transcribe(paths2audio_files=files)):
      5   print(f"Audio in {fname} was recognized as: {transcription}")

File ~/opt/anaconda3/envs/nemo/lib/python3.9/site-packages/torch/autograd/grad_mode.py:28, in _DecoratorContextManager.__call__.<locals>.decorate_context(*args, **kwargs)
     25 @functools.wraps(func)
     26 def decorate_context(*args, **kwargs):
     27     with self.__class__():
---> 28         return func(*args, **kwargs)

File ~/Documents/MetaSense.ai/NeMo/nemo/collections/asr/models/ctc_models.py:268, in EncDecCTCModel.transcribe(self, paths2audio_files, batch_size, logprobs, return_hypotheses, num_workers)
    260 config = {
    261     'paths2audio_files': paths2audio_files,
    262     'batch_size': batch_size,
    263     'temp_dir': tmpdir,
    264     'num_workers': num_workers,
    265 }
    267 temporary_datalayer = self._setup_transcribe_dataloader(config)
--> 268 for test_batch in tqdm(temporary_datalayer, desc="Transcribing"):
    269     logits, logits_len, greedy_predictions = self.forward(
    270         input_signal=test_batch[0].to(device), input_signal_length=test_batch[1].to(device)
    271     )
    272     if logprobs:
    273         # dump log probs per file

File ~/opt/anaconda3/envs/nemo/lib/python3.9/site-packages/tqdm/notebook.py:231, in tqdm_notebook.__init__(self, *args, **kwargs)
    229 colour = kwargs.pop('colour', None)
    230 display_here = kwargs.pop('display', True)
--> 231 super(tqdm_notebook, self).__init__(*args, **kwargs)
    232 if self.disable or not kwargs['gui']:
    233     self.disp = lambda *_, **__: None

File ~/opt/anaconda3/envs/nemo/lib/python3.9/site-packages/tqdm/asyncio.py:33, in tqdm_asyncio.__init__(self, iterable, *args, **kwargs)
     31     self.iterable_next = iterable.__next__
     32 else:
---> 33     self.iterable_iterator = iter(iterable)
     34     self.iterable_next = self.iterable_iterator.__next__

File ~/opt/anaconda3/envs/nemo/lib/python3.9/site-packages/torch/utils/data/dataloader.py:359, in DataLoader.__iter__(self)
    357     return self._iterator
    358 else:
--> 359     return self._get_iterator()

File ~/opt/anaconda3/envs/nemo/lib/python3.9/site-packages/torch/utils/data/dataloader.py:305, in DataLoader._get_iterator(self)
    303 else:
    304     self.check_worker_number_rationality()
--> 305     return _MultiProcessingDataLoaderIter(self)

File ~/opt/anaconda3/envs/nemo/lib/python3.9/site-packages/torch/utils/data/dataloader.py:918, in _MultiProcessingDataLoaderIter.__init__(self, loader)
    911 w.daemon = True
    912 # NB: Process.start() actually take some time as it needs to
    913 #     start a process and pass the arguments over via a pipe.
    914 #     Therefore, we only add a worker to self._workers list after
    915 #     it started, so that we do not call .join() if program dies
    916 #     before it starts, and __del__ tries to join but will get:
    917 #     AssertionError: can only join a started process.
--> 918 w.start()
    919 self._index_queues.append(index_queue)
    920 self._workers.append(w)

File ~/opt/anaconda3/envs/nemo/lib/python3.9/multiprocessing/process.py:121, in BaseProcess.start(self)
    118 assert not _current_process._config.get('daemon'), \
    119        'daemonic processes are not allowed to have children'
    120 _cleanup()
--> 121 self._popen = self._Popen(self)
    122 self._sentinel = self._popen.sentinel
    123 # Avoid a refcycle if the target function holds an indirect
    124 # reference to the process object (see bpo-30775)

File ~/opt/anaconda3/envs/nemo/lib/python3.9/multiprocessing/context.py:224, in Process._Popen(process_obj)
    222 @staticmethod
    223 def _Popen(process_obj):
--> 224     return _default_context.get_context().Process._Popen(process_obj)

File ~/opt/anaconda3/envs/nemo/lib/python3.9/multiprocessing/context.py:284, in SpawnProcess._Popen(process_obj)
    281 @staticmethod
    282 def _Popen(process_obj):
    283     from .popen_spawn_posix import Popen
--> 284     return Popen(process_obj)

File ~/opt/anaconda3/envs/nemo/lib/python3.9/multiprocessing/popen_spawn_posix.py:32, in Popen.__init__(self, process_obj)
     30 def __init__(self, process_obj):
     31     self._fds = []
---> 32     super().__init__(process_obj)

File ~/opt/anaconda3/envs/nemo/lib/python3.9/multiprocessing/popen_fork.py:19, in Popen.__init__(self, process_obj)
     17 self.returncode = None
     18 self.finalizer = None
---> 19 self._launch(process_obj)

File ~/opt/anaconda3/envs/nemo/lib/python3.9/multiprocessing/popen_spawn_posix.py:47, in Popen._launch(self, process_obj)
     45 try:
     46     reduction.dump(prep_data, fp)
---> 47     reduction.dump(process_obj, fp)
     48 finally:
     49     set_spawning_popen(None)

File ~/opt/anaconda3/envs/nemo/lib/python3.9/multiprocessing/reduction.py:60, in dump(obj, file, protocol)
     58 def dump(obj, file, protocol=None):
     59     '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60     ForkingPickler(file, protocol).dump(obj)
titu1994 commented 2 years ago

Huh. I think it was a mistake to make the data loader use more than one worker by default. Can you put your code under a if __name__ == "__main__" block and try ?

Otherwise another option is pass num_workers=0 to transcribe()

sorenjmadsen commented 2 years ago

When I put it under the conditional, it didn't run at all. However, setting the num_workers parameter worked! Thank you!

JenyaPu commented 2 years ago

I had the same issue and num_workers=0 worked for me, but when I switched to a nemo VAD model instead of using ASR I got the following traceback with the error:

Traceback (most recent call last):
  File "D:\Projects\Python\test\test\nemo_diarization.py", line 115, in <module>
    diarization("audio/1.wav")
  File "D:\Projects\Python\test\test\nemo_diarization.py", line 100, in diarization
    sd_model.diarize()
  File "C:\Program Files\Python310\lib\site-packages\nemo\collections\asr\models\clustering_diarizer.py", line 408, in diarize
    self._perform_speech_activity_detection()
  File "C:\Program Files\Python310\lib\site-packages\nemo\collections\asr\models\clustering_diarizer.py", line 301, in _perform_speech_activity_detection
    manifest_vad_input = prepare_manifest(config)
  File "C:\Program Files\Python310\lib\site-packages\nemo\collections\asr\parts\utils\vad_utils.py", line 74, in prepare_manifest
    p = multiprocessing.Pool(processes=config['num_workers'])
  File "C:\Program Files\Python310\lib\multiprocessing\context.py", line 119, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild,
  File "C:\Program Files\Python310\lib\multiprocessing\pool.py", line 205, in __init__
    raise ValueError("Number of processes must be at least 1")
ValueError: Number of processes must be at least 1

On line:

sd_model = ClusteringDiarizer(cfg=config)
sd_model.diarize()

When I switch to num_workers > 0 I receive the following error:

  0%|          | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "D:\Projects\Python\test\test\nemo_diarization.py", line 115, in <module>
    diarization("audio/1.wav")
  File "D:\Projects\Python\test\test\nemo_diarization.py", line 100, in diarization
    sd_model.diarize()
  File "C:\Program Files\Python310\lib\site-packages\nemo\collections\asr\models\clustering_diarizer.py", line 408, in diarize
    self._perform_speech_activity_detection()
  File "C:\Program Files\Python310\lib\site-packages\nemo\collections\asr\models\clustering_diarizer.py", line 308, in _perform_speech_activity_detection
    self._run_vad(manifest_vad_input)
  File "C:\Program Files\Python310\lib\site-packages\nemo\collections\asr\models\clustering_diarizer.py", line 213, in _run_vad
    for i, test_batch in enumerate(tqdm(self._vad_model.test_dataloader())):
  File "C:\Program Files\Python310\lib\site-packages\tqdm\std.py", line 1195, in __iter__
    for obj in iterable:
  File "C:\Program Files\Python310\lib\site-packages\torch\utils\data\dataloader.py", line 438, in __iter__
    return self._get_iterator()
  File "C:\Program Files\Python310\lib\site-packages\torch\utils\data\dataloader.py", line 384, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "C:\Program Files\Python310\lib\site-packages\torch\utils\data\dataloader.py", line 1048, in __init__
    w.start()
  File "C:\Program Files\Python310\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "C:\Program Files\Python310\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Program Files\Python310\lib\multiprocessing\context.py", line 327, in _Popen
    return Popen(process_obj)
  File "C:\Program Files\Python310\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Program Files\Python310\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class 'nemo.collections.common.parts.preprocessing.collections.SpeechLabelEntity'>: attribute lookup SpeechLabelEntity on nemo.collections.common.parts.preprocessing.collections failed

Process finished with exit code 1

The following is the manifest file:

name: &name "ClusterDiarizer"

num_workers: 0
sample_rate: 16000
batch_size: 64

diarizer:
  manifest_filepath: ???
  out_dir: ???
  oracle_vad: False # If True, uses RTTM files provided in manifest file to get speech activity (VAD) timestamps
  collar: 0.25 # Collar value for scoring
  ignore_overlap: True # Consider or ignore overlap segments while scoring

  vad:
    model_path: null # .nemo local model path or pretrained model name or none
    external_vad_manifest: null # This option is provided to use external vad and provide its speech activity labels for speaker embeddings extraction. Only one of model_path or external_vad_manifest should be set

    parameters: # Tuned parameters for CH109 (using the 11 multi-speaker sessions as dev set) 
      window_length_in_sec: 0.15  # Window length in sec for VAD context input 
      shift_length_in_sec: 0.01 # Shift length in sec for generate frame level VAD prediction
      smoothing: "median" # False or type of smoothing method (eg: median)
      overlap: 0.875 # Overlap ratio for overlapped mean/median smoothing filter
      onset: 0.4 # Onset threshold for detecting the beginning and end of a speech 
      offset: 0.7 # Offset threshold for detecting the end of a speech
      pad_onset: 0.05 # Adding durations before each speech segment 
      pad_offset: -0.1 # Adding durations after each speech segment 
      min_duration_on: 0.2 # Threshold for small non_speech deletion
      min_duration_off: 0.2 # Threshold for short speech segment deletion
      filter_speech_first: True 

  speaker_embeddings:
    model_path: ??? # .nemo local model path or pretrained model name (titanet_large, ecapa_tdnn or speakerverification_speakernet)
    parameters:
      window_length_in_sec: 1.5 # Window length(s) in sec (floating-point number). Either a number or a list. Ex) 1.5 or [1.5,1.0,0.5]
      shift_length_in_sec: 0.75 # Shift length(s) in sec (floating-point number). Either a number or a list. Ex) 0.75 or [0.75,0.5,0.25]
      multiscale_weights: null # Weight for each scale. should be null (for single scale) or a list matched with window/shift scale count. Ex) [0.33,0.33,0.33]
      save_embeddings: False # Save embeddings as pickle file for each audio input.

  clustering:
    parameters:
      oracle_num_speakers: False # If True, use num of speakers value provided in manifest file.
      max_num_speakers: 20 # Max number of speakers for each recording. If oracle num speakers is passed, this value is ignored.
      enhanced_count_thres: 80 # If the number of segments is lower than this number, enhanced speaker counting is activated.
      max_rp_threshold: 0.25 # Determines the range of p-value search: 0 < p <= max_rp_threshold. 
      sparse_search_volume: 30 # The higher the number, the more values will be examined with more time. 
      maj_vote_spk_count: False  # If True, take a majority vote on multiple p-values to estimate the number of speakers.

# json manifest line example
# {"audio_filepath": "/path/to/audio_file", "offset": 0, "duration": null, "label": "infer", "text": "-", "num_speakers": null, "rttm_filepath": "/path/to/rttm/file", "uem_filepath": "/path/to/uem/filepath"}
nithinraok commented 2 years ago

Can you provide a notebook to reproduce the error?

JenyaPu commented 2 years ago

Hello! The issue was solved when I updated nemo-toolkit with the following command: python -m pip install git+https://github.com/NVIDIA/NeMo.git@main#egg=nemo_toolkit[asr] It allowed me to specify num_workers: 0 in the "ClusterDiarizer" manifest to avoid the following error, which occurs when a number of workers is not equal to 0: _pickle.PicklingError: Can't pickle <class 'nemo.collections.common.parts.preprocessing.collections.SpeechLabelEntity'>: attribute lookup SpeechLabelEntity on nemo.collections.common.parts.preprocessing.collections failed

Harshada-Mule21 commented 7 months ago

After updating toolkit ...still getting PicklingError: Can't pickle <class 'nemo.collections.common.parts.preprocessing.collections.SpeechLabelEntity'>: attribute lookup SpeechLabelEntity on nemo.collections.common.parts.preprocessing.collections failed