Sxjdwang / TalkLip

405 stars 36 forks source link

[BUG]The bug of the function audio_visual_pad I found #23

Closed JSHZT closed 1 year ago

JSHZT commented 1 year ago

Excellent work, but I found some exceptions when running the demo. The original code is as follows:

def audio_visual_pad(audio_feats, video_feats):
    diff = len(audio_feats) - len(video_feats)
    repeat = 1
    if diff > 0:
        repeat = math.ceil(len(audio_feats) / len(video_feats))
        video_feats = torch.repeat_interleave(video_feats, repeat, dim=0)
    diff = len(audio_feats) - len(video_feats)
    video_feats = video_feats[:diff]
    return video_feats, repeat, diff

In my opinion, what this code does is to process audio features and video features to make them equal in length. Next, the code determines whether the video feature needs to be repeated by judging the value of diff. If diff is greater than 0, it means that the length of the audio feature is greater than the length of the video feature, and the video feature needs to be repeated to make its length equal to the audio feature. Next, the code calculates the length difference between the audio feature and the video feature again, and performs a slicing operation to truncate the length of the video feature to be equal to the audio feature, because after repeated operations, the length of the video feature may exceed the length of the audio feature. The final slicing operation may need to determine whether the length of the video feature exceeds the length of the audio feature. Otherwise, if the input video features and audio features are exactly equal, meaningless results will be returned. This is the result after my modification:

def audio_visual_pad(audio_feats, video_feats):
    diff = len(audio_feats) - len(video_feats)
    repeat = 1
    if diff > 0:
        repeat = math.ceil(len(audio_feats) / len(video_feats))
        video_feats = torch.repeat_interleave(video_feats, repeat, dim=0)
    diff = len(audio_feats) - len(video_feats)
    if diff < 0:
        video_feats = video_feats[:diff]
    return video_feats, repeat, diff

Looking forward to your reply!

Sxjdwang commented 1 year ago

Many thanks for pointing out this potential bug. I overlooked the situation where [:0] returns an empty list.