PicklingError: Can't pickle <class ''>: attribute lookup SpeechLabelEntity on failed #89

Closed mayur-neuralgarage closed 2 months ago

mayur-neuralgarage commented 10 months ago

Initialize NeMo MSDD diarization model

msdd_model = NeuralDiarizer(cfg=create_config(temp_path)).to("cuda")


PicklingError                             Traceback (most recent call last)
Cell In[37], line 3
      1 # Initialize NeMo MSDD diarization model
      2 msdd_model = NeuralDiarizer(cfg=create_config(temp_path)).to("cuda")
----> 3 msdd_model.diarize()
      5 del msdd_model
      6 torch.cuda.empty_cache()

PicklingError: Can't pickle <class ''>: attribute lookup SpeechLabelEntity on failed```
DanielArmyrConversy commented 10 months ago

I just saw this one too. Unfortunately, my example is deep in a rather complex piece of code, so I cannot extract a minimum viable example at this time. We will do some restructure on it, though, and then I can get back to you.

  File "/Users/danielarmyr/Library/Caches/pypoetry/virtualenvs/vbi-nIDHWX03-py3.10/lib/python3.10/site-packages/nemo/collections/asr/models/", line 447, in diarize
    self._extract_embeddings(self.subsegments_manifest_path, scale_idx, len(scales))

  File "/Users/danielarmyr/Library/Caches/pypoetry/virtualenvs/vbi-nIDHWX03-py3.10/lib/python3.10/site-packages/nemo/collections/asr/models/", line 350, in _extract_embeddings
    for test_batch in tqdm(

  File "/Users/danielarmyr/Library/Caches/pypoetry/virtualenvs/vbi-nIDHWX03-py3.10/lib/python3.10/site-packages/tqdm/", line 1170, in __iter__
    for obj in iterable:

  File "/Users/danielarmyr/Library/Caches/pypoetry/virtualenvs/vbi-nIDHWX03-py3.10/lib/python3.10/site-packages/torch/utils/data/", line 435, in __iter__
    return self._get_iterator()

  File "/Users/danielarmyr/Library/Caches/pypoetry/virtualenvs/vbi-nIDHWX03-py3.10/lib/python3.10/site-packages/torch/utils/data/", line 381, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)

  File "/Users/danielarmyr/Library/Caches/pypoetry/virtualenvs/vbi-nIDHWX03-py3.10/lib/python3.10/site-packages/torch/utils/data/", line 1034, in __init__

  File "/Users/danielarmyr/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/", line 121, in start
    self._popen = self._Popen(self)

  File "/Users/danielarmyr/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)

  File "/Users/danielarmyr/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/", line 288, in _Popen
    return Popen(process_obj)

  File "/Users/danielarmyr/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/", line 32, in __init__

  File "/Users/danielarmyr/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/", line 19, in __init__

  File "/Users/danielarmyr/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/", line 47, in _launch
    reduction.dump(process_obj, fp)

  File "/Users/danielarmyr/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)

_pickle.PicklingError: Can't pickle <class ''>: attribute lookup SpeechLabelEntity on failed
MahmoudAshraf97 commented 10 months ago

Hello, how can I reproduce this? can you upload the audio file and list your versions of requirements?

AlbinGyllander commented 10 months ago

I have the same issue. First I get the error as mentioned above: Can't pickle <class ''>: attribute lookup SpeechLabelEntity on failed

And then the program continues and returns the error

        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that `you` are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

I am not sure if this is because I have implemented it incorrectly or if this is a bug but i hope it can help with debugging.

jiayihan2020 commented 10 months ago

I also have the same issue as @AlbinGyllander . I am using Python3.10.11, and the same issue also appeared with python 3.8,. No The packages that I have installed are as follows:

As far as I know, none of audio files work. I am not sure if there is any incompatibility with the set up. Hopefully someone can help in troubleshooting this.

asdfmonster261 commented 10 months ago

I am also getting this pickling error.

Environment
Windows 11
Miniconda, Python 3.9.18
Cuda 11.7 # Also tested with 11.8
Pip list (selected packages):
torch 2.0.1+cu117
nemo-toolkit 1.20.0
whisperx 1.0
faster-whisper 0.7.1
[full list available but truncated for brevity]
Steps to reproduce ``` git clone cd .\whisper-diarization\ conda create -n whisper_diarization python=3.9 # Also tested with 3.10 conda activate whisper_diarization pip install cython pip install torch torchvision torchaudio --index-url pip install -r requirements.txt python .\ --whisper-model large-v2 --device cuda -a .\test\1315.mp3 ```
The error ``` Traceback (most recent call last): File "C:\Users\username\Desktop\whisper-diarization\", line 112, in msdd_model.diarize() File "C:\Users\username\miniconda3\envs\whisper_diarization\lib\site-packages\torch\utils\", line 115, in decorate_context return func(*args, **kwargs) File "C:\Users\username\miniconda3\envs\whisper_diarization\lib\site-packages\nemo\collections\asr\models\", line 1180, in diarize self.clustering_embedding.prepare_cluster_embs_infer() File "C:\Users\username\miniconda3\envs\whisper_diarization\lib\site-packages\nemo\collections\asr\models\", line 699, in prepare_cluster_embs_infer self.emb_sess_test_dict, self.emb_seq_test, self.clus_test_label_dict, _ = self.run_clustering_diarizer( File "C:\Users\username\miniconda3\envs\whisper_diarization\lib\site-packages\nemo\collections\asr\models\", line 866, in run_clustering_diarizer scores = self.clus_diar_model.diarize(batch_size=self.cfg_diar_infer.batch_size) File "C:\Users\username\miniconda3\envs\whisper_diarization\lib\site-packages\nemo\collections\asr\models\", line 437, in diarize self._perform_speech_activity_detection() File "C:\Users\username\miniconda3\envs\whisper_diarization\lib\site-packages\nemo\collections\asr\models\", line 325, in _perform_speech_activity_detection self._run_vad(manifest_vad_input) File "C:\Users\username\miniconda3\envs\whisper_diarization\lib\site-packages\nemo\collections\asr\models\", line 218, in _run_vad for i, test_batch in enumerate( File "C:\Users\username\miniconda3\envs\whisper_diarization\lib\site-packages\tqdm\", line 1182, in __iter__ for obj in iterable: File "C:\Users\username\miniconda3\envs\whisper_diarization\lib\site-packages\torch\utils\data\", line 441, in __iter__ return self._get_iterator() File "C:\Users\username\miniconda3\envs\whisper_diarization\lib\site-packages\torch\utils\data\", line 388, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "C:\Users\username\miniconda3\envs\whisper_diarization\lib\site-packages\torch\utils\data\", line 1042, in __init__ w.start() File "C:\Users\username\miniconda3\envs\whisper_diarization\lib\multiprocessing\", line 121, in start self._popen = self._Popen(self) File "C:\Users\username\miniconda3\envs\whisper_diarization\lib\multiprocessing\", line 224, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "C:\Users\username\miniconda3\envs\whisper_diarization\lib\multiprocessing\", line 327, in _Popen return Popen(process_obj) File "C:\Users\username\miniconda3\envs\whisper_diarization\lib\multiprocessing\", line 93, in __init__ reduction.dump(process_obj, to_child) File "C:\Users\username\miniconda3\envs\whisper_diarization\lib\multiprocessing\", line 60, in dump ForkingPickler(file, protocol).dump(obj) _pickle.PicklingError: Can't pickle : attribute lookup SpeechLabelEntity on failed ```
MahmoudAshraf97 commented 9 months ago

Hi @asdfmonster261 , can you upload .\test\1315.mp3 for me to reproduce?

asdfmonster261 commented 9 months ago


mayur-neuralgarage commented 9 months ago
`# Initialize NeMo MSDD diarization model
msdd_model = NeuralDiarizer(cfg=create_config(temp_path)).to("cuda")

del msdd_model
vad:   0%|          | 0/1 [00:00<?, ?it/s]
MahmoudAshraf97 commented 9 months ago


I couldn't reproduce it unfortunately

asdfmonster261 commented 9 months ago

If it's working fine on your end, could you post what your environment is for using it

ie. pip list, os, python version and possibly a sample audio to confirm with

mayur-neuralgarage commented 9 months ago

@MahmoudAshraf97 i am getting that error when i am runing it in my local PC on colab it is working superfine ....

can you please share your local env file all requirements and everything ie. pip list, os, python version

tomonarifeehan commented 9 months ago

HI @MahmoudAshraf97 any updates on this error? I'm having the same issue.

I'm running Python 3.10.12

Package Version

Package Version (selected):
Python 3.10.12
torch 2.1.0
nemo-toolkit 1.20.0
whisperx 3.1.1
faster-whisper 0.9.0
[full list available but truncated for brevity]

manjunath7472 commented 9 months ago

I found a fix by setting num_worker to 0 in "venv\Lib\site-packages\torch\utils\data\" self.num_workers = 0

MahmoudAshraf97 commented 9 months ago

I implemented @manjunath7472 solution in the latest commit here f740cd1acd80e8e4172348377a1f69903edb5f59