Closed liyunlongaaa closed 2 years ago
You are indeed right! Thank you @liyunlongaaa !
In this line ts_padded.append(torch.cat((t, torch.zeros((t.shape[0], out_size - t.shape[1]))), dim=1))
the padding is defined with zeros but it should be ts_padded.append(torch.cat((t, -1 * torch.ones((t.shape[0], out_size - t.shape[1]))), dim=1))
I am busy at the moment but will try out the change when I find time. In the meantime, if you try it, please let me know if it works for you.
Thank you again, Federico
I think it might be ok. In this function. labels_padded.append(torch.cat((labels[i], -torch.ones((extend, labels[i].shape[1]))), dim=0))
Sequences of different lengths are padded to the same length. In the loss function, loss is normalized by sequence length. I achieved similar results to the original paper using simulated data
I wanted to update on this. Both @liyunlongaaa and @HaoaHaoaHaoaHaoaH are correct. I have updated the repository with the fix I mentioned in the second message. However, this does not affect the current version of the code where models are trained with only 2 speakers.
As @HaoaHaoaHaoaHaoaH pointed out, pad_sequence
pads -1's to make all sequences the same length. However, there is another padding given by pad_labels
which was adding 0's so that all frames would have the same amount of speakers. I have changed the padding value to be -1's. Still, this does not affect the current recipe that only handles 2 speakers.
An update with a newer version that can handle more speakers will be released soon.
Closing this issue now, feel free to reopen if you see fit.
Hi, Thank you for the implement.Sorry I have a question, about the following code , Why ‘’ts_roll == -1‘’ is not "ts_roll == 0", it seems that padding is zero. https://github.com/BUTSpeechFIT/EEND/blob/1263cab4aee287d56152c789d5dc0b6dbe59973f/eend/backend/losses.py#L44-L58