Closed vikneo2017 closed 3 weeks ago
Could you retry? https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/speaker_tasks/Speaker_Diarization_Inference.ipynb
It worked fine.
Same error when model.diarize() after set_start_method('spawn').
I also encountered the same problem, when setting torch.multiprocessing.set_start_method('spawn')
or torch.multiprocessing.set_start_method('forkserver')
, I got an error: _pickle.PicklingError: Can't pickle <class 'nemo.collections .common.parts.preprocessing.collections.SpeechLabelEntity'>: attribute lookup SpeechLabelEntity on nemo.collections.common.parts.preprocessing.collections failed
Could you share the notebook with which you are facing the issue to reproduce?
only torch.multiprocessing.set_start_method('fork') is supported by CUDA.
I had the same issue and num_workers=0 worked for me, but when I switched to a nemo VAD model instead of using ASR I got the following traceback with the error:
Traceback (most recent call last):
File "D:\Projects\Python\test\test\nemo_diarization.py", line 115, in <module>
diarization("audio/1.wav")
File "D:\Projects\Python\test\test\nemo_diarization.py", line 100, in diarization
sd_model.diarize()
File "C:\Program Files\Python310\lib\site-packages\nemo\collections\asr\models\clustering_diarizer.py", line 408, in diarize
self._perform_speech_activity_detection()
File "C:\Program Files\Python310\lib\site-packages\nemo\collections\asr\models\clustering_diarizer.py", line 301, in _perform_speech_activity_detection
manifest_vad_input = prepare_manifest(config)
File "C:\Program Files\Python310\lib\site-packages\nemo\collections\asr\parts\utils\vad_utils.py", line 74, in prepare_manifest
p = multiprocessing.Pool(processes=config['num_workers'])
File "C:\Program Files\Python310\lib\multiprocessing\context.py", line 119, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild,
File "C:\Program Files\Python310\lib\multiprocessing\pool.py", line 205, in __init__
raise ValueError("Number of processes must be at least 1")
ValueError: Number of processes must be at least 1
On line:
sd_model = ClusteringDiarizer(cfg=config)
sd_model.diarize()
When I switch to num_workers > 0 I receive the following error:
0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "D:\Projects\Python\test\test\nemo_diarization.py", line 115, in <module>
diarization("audio/1.wav")
File "D:\Projects\Python\test\test\nemo_diarization.py", line 100, in diarization
sd_model.diarize()
File "C:\Program Files\Python310\lib\site-packages\nemo\collections\asr\models\clustering_diarizer.py", line 408, in diarize
self._perform_speech_activity_detection()
File "C:\Program Files\Python310\lib\site-packages\nemo\collections\asr\models\clustering_diarizer.py", line 308, in _perform_speech_activity_detection
self._run_vad(manifest_vad_input)
File "C:\Program Files\Python310\lib\site-packages\nemo\collections\asr\models\clustering_diarizer.py", line 213, in _run_vad
for i, test_batch in enumerate(tqdm(self._vad_model.test_dataloader())):
File "C:\Program Files\Python310\lib\site-packages\tqdm\std.py", line 1195, in __iter__
for obj in iterable:
File "C:\Program Files\Python310\lib\site-packages\torch\utils\data\dataloader.py", line 438, in __iter__
return self._get_iterator()
File "C:\Program Files\Python310\lib\site-packages\torch\utils\data\dataloader.py", line 384, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "C:\Program Files\Python310\lib\site-packages\torch\utils\data\dataloader.py", line 1048, in __init__
w.start()
File "C:\Program Files\Python310\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Program Files\Python310\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Program Files\Python310\lib\multiprocessing\context.py", line 327, in _Popen
return Popen(process_obj)
File "C:\Program Files\Python310\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
reduction.dump(process_obj, to_child)
File "C:\Program Files\Python310\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class 'nemo.collections.common.parts.preprocessing.collections.SpeechLabelEntity'>: attribute lookup SpeechLabelEntity on nemo.collections.common.parts.preprocessing.collections failed
Process finished with exit code 1
The following is the manifest file:
name: &name "ClusterDiarizer"
num_workers: 0
sample_rate: 16000
batch_size: 64
diarizer:
manifest_filepath: ???
out_dir: ???
oracle_vad: False # If True, uses RTTM files provided in manifest file to get speech activity (VAD) timestamps
collar: 0.25 # Collar value for scoring
ignore_overlap: True # Consider or ignore overlap segments while scoring
vad:
model_path: null # .nemo local model path or pretrained model name or none
external_vad_manifest: null # This option is provided to use external vad and provide its speech activity labels for speaker embeddings extraction. Only one of model_path or external_vad_manifest should be set
parameters: # Tuned parameters for CH109 (using the 11 multi-speaker sessions as dev set)
window_length_in_sec: 0.15 # Window length in sec for VAD context input
shift_length_in_sec: 0.01 # Shift length in sec for generate frame level VAD prediction
smoothing: "median" # False or type of smoothing method (eg: median)
overlap: 0.875 # Overlap ratio for overlapped mean/median smoothing filter
onset: 0.4 # Onset threshold for detecting the beginning and end of a speech
offset: 0.7 # Offset threshold for detecting the end of a speech
pad_onset: 0.05 # Adding durations before each speech segment
pad_offset: -0.1 # Adding durations after each speech segment
min_duration_on: 0.2 # Threshold for small non_speech deletion
min_duration_off: 0.2 # Threshold for short speech segment deletion
filter_speech_first: True
speaker_embeddings:
model_path: ??? # .nemo local model path or pretrained model name (titanet_large, ecapa_tdnn or speakerverification_speakernet)
parameters:
window_length_in_sec: 1.5 # Window length(s) in sec (floating-point number). Either a number or a list. Ex) 1.5 or [1.5,1.0,0.5]
shift_length_in_sec: 0.75 # Shift length(s) in sec (floating-point number). Either a number or a list. Ex) 0.75 or [0.75,0.5,0.25]
multiscale_weights: null # Weight for each scale. should be null (for single scale) or a list matched with window/shift scale count. Ex) [0.33,0.33,0.33]
save_embeddings: False # Save embeddings as pickle file for each audio input.
clustering:
parameters:
oracle_num_speakers: False # If True, use num of speakers value provided in manifest file.
max_num_speakers: 20 # Max number of speakers for each recording. If oracle num speakers is passed, this value is ignored.
enhanced_count_thres: 80 # If the number of segments is lower than this number, enhanced speaker counting is activated.
max_rp_threshold: 0.25 # Determines the range of p-value search: 0 < p <= max_rp_threshold.
sparse_search_volume: 30 # The higher the number, the more values will be examined with more time.
maj_vote_spk_count: False # If True, take a majority vote on multiple p-values to estimate the number of speakers.
# json manifest line example
# {"audio_filepath": "/path/to/audio_file", "offset": 0, "duration": null, "label": "infer", "text": "-", "num_speakers": null, "rttm_filepath": "/path/to/rttm/file", "uem_filepath": "/path/to/uem/filepath"}
Hi @JenyaPu , I change the code in nemo/collections/common/parts/preprocessing/collections.py, line 208 :
OUTPUT_TYPE = collections.namedtuple(typename='SpeechLabelEntity', field_names='audio_file duration label offset',)
to:
class SpeechLabelEntity():
def __init__(self, audio_file, duration, label, offset):
self.audio_file = audio_file
self.duration = duration
self.label = label
self.offset = offset
OUTPUT_TYPE = SpeechLabelEntity
and worked for me. Hope it's helpful.
+1 to @erichtho that fixed for me too. Can now use multithreading (num_workers
> 0).
Edit: but on my machine it looks like it's spawning all the new threads, but everything is still running painfully slow in sequence rather than parallel. I think there might still be something not quite right with the multiprocessing implementation here. For CPU inference it's 12min processing for 25min audio file :/
only torch.multiprocessing.set_start_method('fork') is supported by CUDA.
but pytorch says exactly the opposite:
I suspect this is due to you running this code inside another thread / main app.
on my side works in standalone script but failes within fastapi process for instance (inside uvicorn).
especially considering that it's using pickler with subprocess.
@erichtho fixes works like a charm
@titu1994 Can we change all collections.namedtuple
types to DataClasses?
@gburlet i'm also facing the same issue. did you find any fix for this?
@vishnurk6247 Can you try the solution from https://github.com/NVIDIA/NeMo/issues/3421#issuecomment-1200366702 and if it fixes pls let us know, I will send a PR to update the code. sorry for the delayed response on this.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been inactive for 7 days since being marked as stale.
Dear colleagues, after install Nemo r1.5.0 I have error in method oracle_model.diarize() in [Speaker_Diarization_Inference] https://github.com/NVIDIA/NeMo/blob/main/tutorials/speaker_tasks/Speaker_Diarization_Inference.ipynb)
Environment: