Xflick / EEND_PyTorch

A PyTorch implementation of End-to-End Neural Diarization
MIT License
98 stars 15 forks source link

When I run run.sh, I encountered a problem #3

Closed bbrookie closed 3 years ago

bbrookie commented 3 years ago

First of all thank you for open source this code There is no folder named exp_large, and some files in this directory such as avg.th or transformer.th are not found, can you provide these files? or retraining does not require these files, then how should I pass in these files?. And can the adapt stage be skipped?

Xflick commented 3 years ago

Hi, exp_large will be automatically created during training phase. Model files such as avg.th and transformer*.th are supposed to be stored during training and are not needed for retraining. Or you may just want to perform the inference or adaptation stage without retraining the whole network, I offered my pretrained models in this repo.

then how should I pass in these files?

I didn't quite get you here. Training the network does not require passing avg.th or transformer*.th. For adaptation, the model is passed through init_model. https://github.com/Xflick/EEND_PyTorch/blob/69c0d8de9b54f0b5ab0fa96fc841de652a8173ab/run.sh#L51 For testing, the model is passed through test_model. https://github.com/Xflick/EEND_PyTorch/blob/69c0d8de9b54f0b5ab0fa96fc841de652a8173ab/run.sh#L64

The adaptation stage can be skipped, but there is a huge performance degradation if you simply apply your model trained on simulation data to other real scenarios (such as callhome, etc.).

bbrookie commented 3 years ago

Thank you for your quick reply. I have another question. Does this part of the code support the diarization of multiple speakers(unknown number of speakers)? When I try to train with the AMI dataset, the following error will be reported

`[ INFO : 2020-11-05 09:22:40,290 ] - namespace(batchsize=64, config=[<yamlargparse.Path object at 0x7f1d61e35590>], context_size=7, frame_shift=80, frame_size=200, gpu=1, gradclip=5, gradient_accumulation_steps=1, hidden_size=256, initmodel='', input_transform='logmel23_mn', label_delay=0, lr=1.0, max_epochs=100, model_save_dir='exp_large/models', model_type='Transformer', noam_warmup_steps=100000.0, num_frames=500, num_speakers=2, optimizer='noam', resume='', sampling_rate=16000, seed=777, subsampling=10, train_data_dir='/home/tp/projects/kaldi_style_data/train', transformer_encoder_dropout=0.1, transformer_encoder_n_heads=4, transformer_encoder_n_layers=4, valid_data_dir='/home/tp/projects/kaldi_style_data/dev') 10095 chunks 2086 chunks Traceback (most recent call last): File "eend/bin/train.py", line 63, in train(args) File "/home/tp/projects/EEND_PyTorch-master/eend/pytorch_backend/train.py", line 68, in train Y, T = next(iter(train_set)) StopIteration Start model averaging Namespace(ifiles=['exp_large/models/transformer91.th', 'exp_large/models/transformer92.th', 'exp_large/models/transformer93.th', 'exp_large/models/transformer94.th', 'exp_large/models/transformer95.th', 'exp_large/models/transformer96.th', 'exp_large/models/transformer97.th', 'exp_large/models/transformer98.th', 'exp_large/models/transformer99.th', 'exp_large/models/transformer100.th'], ofile='/home/tp/projects/EEND_PyTorch-master/pretrained_models/large/model_callhome.th') Traceback (most recent call last): File "eend/bin/model_averaging.py", line 35, in average_model(args.ifiles, args.ofile) File "eend/bin/model_averaging.py", line 18, in average_model tmpmodel = torch.load(ifile) File "/home/tp/anaconda3/envs/pyaudio/lib/python3.7/site-packages/torch/serialization.py", line 584, in load with _open_file_like(f, 'rb') as opened_file: File "/home/tp/anaconda3/envs/pyaudio/lib/python3.7/site-packages/torch/serialization.py", line 234, in _open_file_like return _open_file(name_or_buffer, mode) File "/home/tp/anaconda3/envs/pyaudio/lib/python3.7/site-packages/torch/serialization.py", line 215, in init super(_open_file, self).init(open(name, mode)) FileNotFoundError: [Errno 2] No such file or directory: 'exp_large/models/transformer91.th' Start adapting Traceback (most recent call last): File "/home/tp/anaconda3/envs/pyaudio/lib/python3.7/argparse.py", line 1787, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/home/tp/anaconda3/envs/pyaudio/lib/python3.7/argparse.py", line 1993, in _parse_known_args start_index = consume_optional(start_index) File "/home/tp/anaconda3/envs/pyaudio/lib/python3.7/argparse.py", line 1923, in consume_optional arg_count = match_argument(action, selected_patterns) File "/home/tp/anaconda3/envs/pyaudio/lib/python3.7/argparse.py", line 2088, in _match_argument raise ArgumentError(action, msg) argparse.ArgumentError: argument -c/--config: expected one argument

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "eend/bin/train.py", line 57, in args = parser.parse_args() File "/home/tp/anaconda3/envs/pyaudio/lib/python3.7/site-packages/yamlargparse.py", line 158, in parse_args cfg = super().parse_args(args=args, namespace=namespace) File "/home/tp/anaconda3/envs/pyaudio/lib/python3.7/argparse.py", line 1755, in parse_args args, argv = self.parse_known_args(args, namespace) File "/home/tp/anaconda3/envs/pyaudio/lib/python3.7/argparse.py", line 1794, in parse_known_args self.error(str(err)) File "/home/tp/anaconda3/envs/pyaudio/lib/python3.7/site-packages/yamlargparse.py", line 671, in error raise ParserError(message) yamlargparse.ParserError: argument -c/--config: expected one argument Start model averaging Namespace(ifiles=['/transformer91.th', '/transformer92.th', '/transformer93.th', '/transformer94.th', '/transformer95.th', '/transformer96.th', '/transformer97.th', '/transformer98.th', '/transformer99.th', '/transformer100.th'], ofile='/home/tp/projects/EEND_PyTorch-master/pretrained_models/large/model_callhome.th') Traceback (most recent call last): File "eend/bin/model_averaging.py", line 35, in average_model(args.ifiles, args.ofile) File "eend/bin/model_averaging.py", line 18, in average_model tmpmodel = torch.load(ifile) File "/home/tp/anaconda3/envs/pyaudio/lib/python3.7/site-packages/torch/serialization.py", line 584, in load with _open_file_like(f, 'rb') as opened_file: File "/home/tp/anaconda3/envs/pyaudio/lib/python3.7/site-packages/torch/serialization.py", line 234, in _open_file_like return _open_file(name_or_buffer, mode) File "/home/tp/anaconda3/envs/pyaudio/lib/python3.7/site-packages/torch/serialization.py", line 215, in init super(_open_file, self).init(open(name, mode)) FileNotFoundError: [Errno 2] No such file or directory: '/transformer91.th'`

if i want train the model on the simulation data(such as mini_librispeech) and adapt on thr AMI dataset, How do I set the simu_opts_num_speaker parameter ,and num_speakers in adapt.yaml should be set to the maximum number of speakers in AMI?

Looking forward to your reply, thanks

Xflick commented 3 years ago

Hi, traditional EEND only supports fixed number of speakers. It is possible to set the training speaker number to match the maximum number of speakers in your test set, but the training process will become really slow (I personally found speaker num > 3's training time unacceptable).

From what you have described, I recommend you to read their latest works, which support variable number of speakers.

Neural Speaker Diarization with Speaker-Wise Chain Rule End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors

bbrookie commented 3 years ago

Hi, traditional EEND only supports fixed number of speakers. It is possible to set the training speaker number to match the maximum number of speakers in your test set, but the training process will become really slow (I personally found speaker num > 3's training time unacceptable).

From what you have described, I recommend you to read their latest works, which support variable number of speakers.

Neural Speaker Diarization with Speaker-Wise Chain Rule End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors

Thank you very much for your guidance, I will close this issue,

maerduduqi commented 1 month ago

首先感谢你开源这个代码。 没有名为exp_large的文件夹,并且这个目录中找不到某些文件如_avg.th_或transformer.th,你能提供这些文件吗?或者重新训练不需要这些文件,那么我应该如何格式化这些文件?并且可以克服适应阶段吗?

请问你解决了嘛,我也遇到同样的问题