issues about speaker info and language info usage in OLR2020 Baseline

pengyizhou commented 4 years ago

When we are training the baseline system, we are wondering what to use as the speaker information. Can we ask that when you are training the ivector system, did you change spkid to langid in spk2utt and utt2spk file? Or just used the original spk info to train an UBM and i-vector extractor. Whatever the condition is, when we train the classifier, I think we should use languages as the labels, is there any problem if we use speaker info to train an i-vector extractor and classify the vector to some languages?

pengyizhou commented 4 years ago

Is it wrong to train an i-vector or x-vector model using language-id instead of speaker-id in lid? Because if we see language id as speaker id in baseline system, we get a better results than using the original speaker id to train the model.

Snowdar commented 4 years ago

Hi, do not be trouble in this problem and what the class ids are depend on the target classfication task only. Therefore, just use lid-id in lid task.

Sent from my iPhone

On Jul 31, 2020, at 5:53 PM, pengyizhou notifications@github.com wrote:

Is it wrong to train an i-vector or x-vector model using language-id instead of speaker-id in lid? Because if we see language id as speaker id in baseline system, we get a better results than using the original speaker id to train the model.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

pengyizhou commented 4 years ago

Thanks a lot!

ASR2020Guru commented 3 years ago

Hi @Snowdar,

Thanks for this killing toolkit. I have a followup question:

For the OLR2020 baseline and LID task, the training dataset includes multiple spks(each spk has multiple utterances and one language label). And the test dataset includes multiple spks which not been seen in the training dataset before.

From my understanding based on your answer, I see that the difference between i-vector/-vector SRE task and i-vector/-vector LID task is use either spk-id or languge-id as label. For the LID task, it is true that we should use language-id as the label. For example, if we do an LID task on such dataset which has 6 different languages, each language has 1000 spks, each spk has only one utterance. It is easy to explain that trained model successfully learned the language-related feature not the spk-related.

But if we do an LID task on such dataset which has 6 languages, each language has 30 spks, each spk has 300 utterances. If we use language-id as label here, 24 spk for train and 6 for test. How do we know the trained model can successfully predict the language label because it learned the language-related features or spk-related features?

I am confused about this part. Thanks in advance

Snowdar commented 3 years ago

Hi, If you use lid id as the label to train model and then the model could recognize different languages. Note that, it is independent to speaker information for the tartget of language loss is to classify language only.

Good luck!

On Mar 10, 2021, at 7:12 PM, ASR2020Guru notifications@github.com wrote:

Hi @Snowdar,

Thanks for this killing toolkit. I have a followup question:

For the OLR2020 baseline and LID task, the training dataset includes multiple spks(each spk has multiple utterances and one language label). And the test dataset includes multiple spks which not been seen in the training dataset before.

From my understanding based on your answer, I see that the difference between i-vector/-vector SRE task and i-vector/-vector LID task is use either spk-id or languge-id as label. For the LID task, it is true that we should use language-id as the label. For example, if we do an LID task on such dataset which has 6 different languages, each language has 1000 spks, each spk has only one utterance. It is easy to explain that trained model successfully learned the language-related feature not the spk-related.

But if we do an LID task on such dataset which has 6 languages, each language has 30 spks, each spk has 300 utterances. If we use language-id as label here, 24 spk for train and 6 for test. How do we know the trained model can successfully predict the language label because it learned the language-related features or spk-related features?

I am confused about this part. Thanks in advance

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

ASR2020Guru commented 3 years ago

Hi @Snowdar,

Thanks for your reply.

For the training loss and validation loss in the training process, how we split the training dataset in the training part and validation part. Do we split the training dataset by different spk? In such case, the spks in the validation part do not appeared in the training part. Or we shuffled the whole training dataset first, and then i.e., use the first 90% portion as the training part and the rest as the validation part? In such case, the spks in the validation part could appeared in the training part as well.

Thanks

Snowdar commented 3 years ago

Hi,

You could split train and valid dataset by language labels. In general, make sure that both train and valid contain all class-labels.

Good luck!

On Mar 11, 2021, at 4:54 PM, ASR2020Guru notifications@github.com wrote:

Hi @Snowdar,

Thanks for your reply.

For the training loss and validation loss in the training process, how we split the training dataset in the training part and validation part. Do we split the training dataset by different spk? In such case, the spks in the validation part do not appeared in the training part. Or we shuffled the whole training dataset first, and then i.e., use the first 90% portion as the training part and the rest as the validation part? In such case, the spks in the validation part could appeared in the training part as well.

Thanks

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

Snowdar / asv-subtools

issues about speaker info and language info usage in OLR2020 Baseline #5