Speaker Diarization Inference error with pickle

vikneo2017 commented 2 years ago

Dear colleagues, after install Nemo r1.5.0 I have error in method oracle_model.diarize() in [Speaker_Diarization_Inference] https://github.com/NVIDIA/NeMo/blob/main/tutorials/speaker_tasks/Speaker_Diarization_Inference.ipynb)

from nemo.collections.asr.models import ClusteringDiarizer
oracle_model = ClusteringDiarizer(cfg=config)
oracle_model.diarize()

PicklingError: Can't pickle <class 'nemo.collections.common.parts.preprocessing.collections.SpeechLabelEntity'>: attribute lookup SpeechLabelEntity on nemo.collections.common.parts.preprocessing.collections failed

Environment:

Anaconda 3
Python3.8
NeMo r1.5.0

nithinraok commented 2 years ago

Could you retry? https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/speaker_tasks/Speaker_Diarization_Inference.ipynb

It worked fine.

erichtho commented 2 years ago

Same error when model.diarize() after set_start_method('spawn').

sakurakatana commented 2 years ago

I also encountered the same problem, when setting torch.multiprocessing.set_start_method('spawn') or torch.multiprocessing.set_start_method('forkserver'), I got an error: _pickle.PicklingError: Can't pickle <class 'nemo.collections .common.parts.preprocessing.collections.SpeechLabelEntity'>: attribute lookup SpeechLabelEntity on nemo.collections.common.parts.preprocessing.collections failed

nithinraok commented 2 years ago

Could you share the notebook with which you are facing the issue to reproduce?

okuchaiev commented 2 years ago

only torch.multiprocessing.set_start_method('fork') is supported by CUDA.

JenyaPu commented 2 years ago

I had the same issue and num_workers=0 worked for me, but when I switched to a nemo VAD model instead of using ASR I got the following traceback with the error:

Traceback (most recent call last):
  File "D:\Projects\Python\test\test\nemo_diarization.py", line 115, in <module>
    diarization("audio/1.wav")
  File "D:\Projects\Python\test\test\nemo_diarization.py", line 100, in diarization
    sd_model.diarize()
  File "C:\Program Files\Python310\lib\site-packages\nemo\collections\asr\models\clustering_diarizer.py", line 408, in diarize
    self._perform_speech_activity_detection()
  File "C:\Program Files\Python310\lib\site-packages\nemo\collections\asr\models\clustering_diarizer.py", line 301, in _perform_speech_activity_detection
    manifest_vad_input = prepare_manifest(config)
  File "C:\Program Files\Python310\lib\site-packages\nemo\collections\asr\parts\utils\vad_utils.py", line 74, in prepare_manifest
    p = multiprocessing.Pool(processes=config['num_workers'])
  File "C:\Program Files\Python310\lib\multiprocessing\context.py", line 119, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild,
  File "C:\Program Files\Python310\lib\multiprocessing\pool.py", line 205, in __init__
    raise ValueError("Number of processes must be at least 1")
ValueError: Number of processes must be at least 1

On line:

sd_model = ClusteringDiarizer(cfg=config)
sd_model.diarize()

When I switch to num_workers > 0 I receive the following error:

  0%|          | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "D:\Projects\Python\test\test\nemo_diarization.py", line 115, in <module>
    diarization("audio/1.wav")
  File "D:\Projects\Python\test\test\nemo_diarization.py", line 100, in diarization
    sd_model.diarize()
  File "C:\Program Files\Python310\lib\site-packages\nemo\collections\asr\models\clustering_diarizer.py", line 408, in diarize
    self._perform_speech_activity_detection()
  File "C:\Program Files\Python310\lib\site-packages\nemo\collections\asr\models\clustering_diarizer.py", line 308, in _perform_speech_activity_detection
    self._run_vad(manifest_vad_input)
  File "C:\Program Files\Python310\lib\site-packages\nemo\collections\asr\models\clustering_diarizer.py", line 213, in _run_vad
    for i, test_batch in enumerate(tqdm(self._vad_model.test_dataloader())):
  File "C:\Program Files\Python310\lib\site-packages\tqdm\std.py", line 1195, in __iter__
    for obj in iterable:
  File "C:\Program Files\Python310\lib\site-packages\torch\utils\data\dataloader.py", line 438, in __iter__
    return self._get_iterator()
  File "C:\Program Files\Python310\lib\site-packages\torch\utils\data\dataloader.py", line 384, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "C:\Program Files\Python310\lib\site-packages\torch\utils\data\dataloader.py", line 1048, in __init__
    w.start()
  File "C:\Program Files\Python310\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "C:\Program Files\Python310\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Program Files\Python310\lib\multiprocessing\context.py", line 327, in _Popen
    return Popen(process_obj)
  File "C:\Program Files\Python310\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Program Files\Python310\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class 'nemo.collections.common.parts.preprocessing.collections.SpeechLabelEntity'>: attribute lookup SpeechLabelEntity on nemo.collections.common.parts.preprocessing.collections failed

Process finished with exit code 1

The following is the manifest file:

name: &name "ClusterDiarizer"

num_workers: 0
sample_rate: 16000
batch_size: 64

diarizer:
  manifest_filepath: ???
  out_dir: ???
  oracle_vad: False # If True, uses RTTM files provided in manifest file to get speech activity (VAD) timestamps
  collar: 0.25 # Collar value for scoring
  ignore_overlap: True # Consider or ignore overlap segments while scoring

  vad:
    model_path: null # .nemo local model path or pretrained model name or none
    external_vad_manifest: null # This option is provided to use external vad and provide its speech activity labels for speaker embeddings extraction. Only one of model_path or external_vad_manifest should be set

    parameters: # Tuned parameters for CH109 (using the 11 multi-speaker sessions as dev set) 
      window_length_in_sec: 0.15  # Window length in sec for VAD context input 
      shift_length_in_sec: 0.01 # Shift length in sec for generate frame level VAD prediction
      smoothing: "median" # False or type of smoothing method (eg: median)
      overlap: 0.875 # Overlap ratio for overlapped mean/median smoothing filter
      onset: 0.4 # Onset threshold for detecting the beginning and end of a speech 
      offset: 0.7 # Offset threshold for detecting the end of a speech
      pad_onset: 0.05 # Adding durations before each speech segment 
      pad_offset: -0.1 # Adding durations after each speech segment 
      min_duration_on: 0.2 # Threshold for small non_speech deletion
      min_duration_off: 0.2 # Threshold for short speech segment deletion
      filter_speech_first: True 

  speaker_embeddings:
    model_path: ??? # .nemo local model path or pretrained model name (titanet_large, ecapa_tdnn or speakerverification_speakernet)
    parameters:
      window_length_in_sec: 1.5 # Window length(s) in sec (floating-point number). Either a number or a list. Ex) 1.5 or [1.5,1.0,0.5]
      shift_length_in_sec: 0.75 # Shift length(s) in sec (floating-point number). Either a number or a list. Ex) 0.75 or [0.75,0.5,0.25]
      multiscale_weights: null # Weight for each scale. should be null (for single scale) or a list matched with window/shift scale count. Ex) [0.33,0.33,0.33]
      save_embeddings: False # Save embeddings as pickle file for each audio input.

  clustering:
    parameters:
      oracle_num_speakers: False # If True, use num of speakers value provided in manifest file.
      max_num_speakers: 20 # Max number of speakers for each recording. If oracle num speakers is passed, this value is ignored.
      enhanced_count_thres: 80 # If the number of segments is lower than this number, enhanced speaker counting is activated.
      max_rp_threshold: 0.25 # Determines the range of p-value search: 0 < p <= max_rp_threshold. 
      sparse_search_volume: 30 # The higher the number, the more values will be examined with more time. 
      maj_vote_spk_count: False  # If True, take a majority vote on multiple p-values to estimate the number of speakers.

# json manifest line example
# {"audio_filepath": "/path/to/audio_file", "offset": 0, "duration": null, "label": "infer", "text": "-", "num_speakers": null, "rttm_filepath": "/path/to/rttm/file", "uem_filepath": "/path/to/uem/filepath"}

erichtho commented 2 years ago

Hi @JenyaPu , I change the code in nemo/collections/common/parts/preprocessing/collections.py, line 208 :

   OUTPUT_TYPE = collections.namedtuple(typename='SpeechLabelEntity', field_names='audio_file duration label offset',)

to:

    class SpeechLabelEntity():
        def __init__(self, audio_file, duration, label, offset):
            self.audio_file = audio_file
            self.duration = duration
            self.label = label
            self.offset = offset
    OUTPUT_TYPE = SpeechLabelEntity

and worked for me. Hope it's helpful.

gburlet commented 1 year ago

+1 to @erichtho that fixed for me too. Can now use multithreading (num_workers > 0).

Edit: but on my machine it looks like it's spawning all the new threads, but everything is still running painfully slow in sequence rather than parallel. I think there might still be something not quite right with the multiprocessing implementation here. For CPU inference it's 12min processing for 25min audio file :/

WuSu4620 commented 1 year ago

only torch.multiprocessing.set_start_method('fork') is supported by CUDA.

but pytorch says exactly the opposite:

https://pytorch.org/docs/stable/notes/multiprocessing.html

jqueguiner commented 1 year ago

I suspect this is due to you running this code inside another thread / main app.

on my side works in standalone script but failes within fastapi process for instance (inside uvicorn).

especially considering that it's using pickler with subprocess.

@erichtho fixes works like a charm

nithinraok commented 1 year ago

@titu1994 Can we change all collections.namedtuple types to DataClasses?

vishnurk6247 commented 2 months ago

@gburlet i'm also facing the same issue. did you find any fix for this?

nithinraok commented 2 months ago

@vishnurk6247 Can you try the solution from https://github.com/NVIDIA/NeMo/issues/3421#issuecomment-1200366702 and if it fixes pls let us know, I will send a PR to update the code. sorry for the delayed response on this.

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 3 weeks ago

This issue was closed because it has been inactive for 7 days since being marked as stale.

NVIDIA / NeMo

Speaker Diarization Inference error with pickle #3421