During training and testing, the of AMI data set is down sampled to 8kHz, and num_speakers is adjusted from 2 to 5.(The number of speakers in AMI dataset is between 2-5)
The final training loss is 0.09410,and the best DER for the dev set is 28.64. The best test DER is 78.72. It seem that the model cannot distinguish different speakers.
I used your pre training model and then trained in the AMI train dataset. Then test the model in AMI test. The config of adaption list bellow:
adapt options
sampling_rate: 8000 frame_size: 200 frame_shift: 80 model_type: Transformer max_epochs: 100 gradclip: 5 batchsize: 64 hidden_size: 256 num_frames: 500 num_speakers: 5 input_transform: logmel23_mn optimizer: adam lr: 1e-5 context_size: 7 subsampling: 10 gradient_accumulation_steps: 1 transformer_encoder_n_heads: 4 transformer_encoder_n_layers: 4 transformer_encoder_dropout: 0.1 noam_warmup_steps: 100000 seed: 777 gpu: 1
During training and testing, the of AMI data set is down sampled to 8kHz, and num_speakers is adjusted from 2 to 5.(The number of speakers in AMI dataset is between 2-5)
The final training loss is 0.09410,and the best DER for the dev set is 28.64. The best test DER is 78.72. It seem that the model cannot distinguish different speakers.
part of result rttm
SPEAKER EN2002a 1 0.00 260.00 EN2002a_0
SPEAKER EN2002a 1 260.06 0.06 EN2002a_0
SPEAKER EN2002a 1 260.15 0.02 EN2002a_0
SPEAKER EN2002a 1 260.24 0.01 EN2002a_0
SPEAKER EN2002a 1 260.68 0.01 EN2002a_0
SPEAKER EN2002a 1 261.21 0.04 EN2002a_0
SPEAKER EN2002a 1 261.28 0.12 EN2002a_0
SPEAKER EN2002a 1 261.41 0.06 EN2002a_0
SPEAKER EN2002a 1 261.76 0.09 EN2002a_0
SPEAKER EN2002a 1 261.86 0.08 EN2002a_0
SPEAKER EN2002a 1 264.35 0.01 EN2002a_0
SPEAKER EN2002a 1 264.41 0.03 EN2002a_0
SPEAKER EN2002a 1 264.45 0.03 EN2002a_0
SPEAKER EN2002a 1 264.66 0.01 EN2002a_0
SPEAKER EN2002a 1 264.85 0.01 EN2002a_0
SPEAKER EN2002a 1 264.87 0.01 EN2002a_0
SPEAKER EN2002a 1 264.89 0.04 EN2002a_0
SPEAKER EN2002a 1 264.94 0.01 EN2002a_0
SPEAKER EN2002a 1 265.34 0.08 EN2002a_0
SPEAKER EN2002a 1 265.43 0.07 EN2002a_0
SPEAKER EN2002a 1 266.27 0.01 EN2002a_0
SPEAKER EN2002a 1 266.30 0.01 EN2002a_0
SPEAKER EN2002a 1 266.34 0.03 EN2002a_0
SPEAKER EN2002a 1 266.73 0.01 EN2002a_0
SPEAKER EN2002a 1 266.79 0.03 EN2002a_0
SPEAKER EN2002a 1 266.83 0.01 EN2002a_0
SPEAKER EN2002a 1 267.22 0.01 EN2002a_0
SPEAKER EN2002a 1 267.68 0.02 EN2002a_0
SPEAKER EN2002a 1 267.78 0.01 EN2002a_0
SPEAKER EN2002a 1 267.80 0.02 EN2002a_0
SPEAKER EN2002a 1 267.83 0.01 EN2002a_0
SPEAKER EN2002a 1 267.90 0.01 EN2002a_0
SPEAKER EN2002a 1 270.72 0.01 EN2002a_0
SPEAKER EN2002a 1 270.82 0.02 EN2002a_0
SPEAKER EN2002a 1 270.87 0.01 EN2002a_0
SPEAKER EN2002a 1 271.58 0.01 EN2002a_0
SPEAKER EN2002a 1 273.28 0.02 EN2002a_0
SPEAKER EN2002a 1 273.31 0.01 EN2002a_0
SPEAKER EN2002a 1 273.79 0.02 EN2002a_0
SPEAKER EN2002a 1 275.43 0.01 EN2002a_0
SPEAKER EN2002a 1 275.53 0.01 EN2002a_0
SPEAKER EN2002a 1 277.82 0.01 EN2002a_0
SPEAKER EN2002a 1 277.85 0.03 EN2002a_0
SPEAKER EN2002a 1 277.89 0.04 EN2002a_0
SPEAKER EN2002a 1 277.95 0.01 EN2002a_0
SPEAKER EN2002a 1 277.97 0.01 EN2002a_0
SPEAKER EN2002a 1 278.01 0.01 EN2002a_0
SPEAKER EN2002a 1 278.05 0.01 EN2002a_0
SPEAKER EN2002a 1 278.13 0.01 EN2002a_0
SPEAKER EN2002a 1 279.85 0.03 EN2002a_0
SPEAKER EN2002a 1 279.95 0.01 EN2002a_0
SPEAKER EN2002a 1 280.00 69.07 EN2002a_0
SPEAKER EN2002a 1 349.08 0.08 EN2002a_0
SPEAKER EN2002a 1 349.22 4.80 EN2002a_0
SPEAKER EN2002a 1 354.30 185.70 EN2002a_0
SPEAKER EN2002a 1 560.00 21.10 EN2002a_0
SPEAKER EN2002a 1 581.11 0.09 EN2002a_0
SPEAKER EN2002a 1 581.22 0.03 EN2002a_0
SPEAKER EN2002a 1 581.34 0.01 EN2002a_0
SPEAKER EN2002a 1 581.38 0.01 EN2002a_0
SPEAKER EN2002a 1 581.40 1.32 EN2002a_0
SPEAKER EN2002a 1 582.73 0.02 EN2002a_0
SPEAKER EN2002a 1 582.77 1.36 EN2002a_0
SPEAKER EN2002a 1 584.15 0.06 EN2002a_0
SPEAKER EN2002a 1 584.30 0.01 EN2002a_0
SPEAKER EN2002a 1 584.39 0.71 EN2002a_0
SPEAKER EN2002a 1 585.11 9.06 EN2002a_0
SPEAKER EN2002a 1 594.18 0.01 EN2002a_0
SPEAKER EN2002a 1 594.72 0.02 EN2002a_0
SPEAKER EN2002a 1 594.75 1.22 EN2002a_0
SPEAKER EN2002a 1 596.32 0.01 EN2002a_0
SPEAKER EN2002a 1 596.34 0.04 EN2002a_0
SPEAKER EN2002a 1 596.39 0.01 EN2002a_0
SPEAKER EN2002a 1 596.41 1.47 EN2002a_0
SPEAKER EN2002a 1 597.89 0.02 EN2002a_0
SPEAKER EN2002a 1 597.92 142.08 EN2002a_0
SPEAKER EN2002a 1 763.29 0.04 EN2002a_0
SPEAKER EN2002a 1 764.11 0.02 EN2002a_0
SPEAKER EN2002a 1 780.00 80.00 EN2002a_0
SPEAKER EN2002a 1 880.00 240.00 EN2002a_0
SPEAKER EN2002a 1 1140.00 300.00 EN2002a_0
SPEAKER EN2002a 1 1440.04 0.02 EN2002a_0
SPEAKER EN2002a 1 1440.15 0.01 EN2002a_0
SPEAKER EN2002a 1 1440.17 0.02 EN2002a_0
SPEAKER EN2002a 1 1440.22 0.02