about loss.py question - Githubissues

liyunlongaaa commented 2 years ago

Hi, Thank you for the implement.Sorry I have a question, about the following code , Why ‘’ts_roll == -1‘’ is not "ts_roll == 0", it seems that padding is zero. https://github.com/BUTSpeechFIT/EEND/blob/1263cab4aee287d56152c789d5dc0b6dbe59973f/eend/backend/losses.py#L44-L58

fnlandini commented 2 years ago

You are indeed right! Thank you @liyunlongaaa ! In this line ts_padded.append(torch.cat((t, torch.zeros((t.shape[0], out_size - t.shape[1]))), dim=1)) the padding is defined with zeros but it should be ts_padded.append(torch.cat((t, -1 * torch.ones((t.shape[0], out_size - t.shape[1]))), dim=1))

I am busy at the moment but will try out the change when I find time. In the meantime, if you try it, please let me know if it works for you.

Thank you again, Federico

HaoFengyuan commented 2 years ago

I think it might be ok. In this function. labels_padded.append(torch.cat((labels[i], -torch.ones((extend, labels[i].shape[1]))), dim=0)) Sequences of different lengths are padded to the same length. In the loss function, loss is normalized by sequence length. I achieved similar results to the original paper using simulated data

fnlandini commented 2 years ago

I wanted to update on this. Both @liyunlongaaa and @HaoaHaoaHaoaHaoaH are correct. I have updated the repository with the fix I mentioned in the second message. However, this does not affect the current version of the code where models are trained with only 2 speakers. As @HaoaHaoaHaoaHaoaH pointed out, pad_sequence pads -1's to make all sequences the same length. However, there is another padding given by pad_labels which was adding 0's so that all frames would have the same amount of speakers. I have changed the padding value to be -1's. Still, this does not affect the current recipe that only handles 2 speakers. An update with a newer version that can handle more speakers will be released soon. Closing this issue now, feel free to reopen if you see fit.

BUTSpeechFIT / EEND

about loss.py question #3