BUTSpeechFIT / DiaPer

MIT License
45 stars 3 forks source link

Why are my inference results different #6

Closed coutAfaiendl closed 1 month ago

coutAfaiendl commented 2 months ago

I used the “run_example.sh” from your example for single inference, but when I performed inference on IS1009a.wav, I obtained a significantly incorrect RTTM file. During multi-task inference, I used python diaper/infer.py -c examples/infer.yaml for inference and tested the provided model, achieving a DER of 90% on the VoxConverse test set. Could you please let me know if there’s something I might have overlooked during the inference process that caused me to be unable to obtain accurate results?

fnlandini commented 2 months ago

Hi,

If you ran run_example.sh without changing any of the parameters, you should have obtained the same output as https://github.com/BUTSpeechFIT/DiaPer/blob/main/examples/IS1009a_infer_16k_10attractors.rttm Does your output differ from the reference file?

The same model without fine-tuning on VoxConverse test should be 23.2 DER so there must be something off if you get 90%. If you could share the yaml file contents that you are using, that would be helpful to see if something is not as it should be.

coutAfaiendl commented 2 months ago

Hi, @fnlandini When I ran run_example.sh, I did not modify any configurations except for changing the paths. However, the results I obtained have significant issues, with the timestamps being fragmented into very small segments each time. When testing on VoxConverse, I used infer.yaml, changed the paths, and modified the following parameters: sampling_rate from 8000 to 16000, frame_shift from 80 to 160, frame_size from 200 to 400, and feature_dim from 23 to 40. I did not make any other changes.

fnlandini commented 2 months ago

Hi @coutAfaiendl Sorry for the delay. I you look at https://github.com/BUTSpeechFIT/DiaPer/blob/main/examples/IS1009a_infer_16k_10attractors.rttm you will see that the segments are very short. Do you mean that the segments you obtain are even shorter? If you obtain the same output, then that is the expected behavior, even if the results are bad for that file.

I recommend you use https://github.com/BUTSpeechFIT/DiaPer/blob/main/examples/infer_16k_10attractors.yaml with VoxConverse. You should be able to obtain the results reported in https://github.com/BUTSpeechFIT/DiaPer/tree/main/results/DiaPer/10attractors/VoxConverse/withoutFT/test/rttms if you use that model. If you try other combinations, I am afraid I cannot help you debug because I have not tried them.

I hope this helps.

fnlandini commented 1 month ago

Closing due to inactivity. Feel free to reopen if you see fit.