facebookresearch / AVT

Code release for ICCV 2021 paper "Anticipative Video Transformer"
Apache License 2.0
152 stars 28 forks source link

Unable to reproduce '01_ek100_avt.txt' val result #23

Closed zhoumumu closed 2 years ago

zhoumumu commented 2 years ago

Hi @rohitgirdhar, I'm trying to reproduce the experiment '01_ek100_avt.txt'. After training, I read my evaluation results using your example code in README.md, which from ’notebooks.utils‘. And I got these outputs:

[('expts/01_ek100_avt.txt', 0)] Accuracies verb/noun/action: 32.3 77.3 22.3 51.8 13.6 32.8
[('expts/01_ek100_avt.txt', 0)] Mean class top-1 accuracies verb/noun/action: 6.0 7.9 1.5
[('expts/01_ek100_avt.txt', 0)] Recall@5 verb/noun/action: 22.3 28.7 12.0
[('expts/01_ek100_avt.txt', 0)] Recall@5 many shot verb/noun/action: nan nan 12.0
[('expts/01_ek100_avt.txt', 0)] Recall@5 tail verb/noun/action: 13.9 19.3 8.9
[('expts/01_ek100_avt.txt', 0)] Recall@5 unseen verb/noun/action: 27.5 26.0 12.6

My reproduction result is 12.0, lower than your report result 14.9. Am I doing the right evaluation process?

And the only thing I changed in the training is the interface of reading data. Could you please help me to check whether it could be the reason I fail the reproduction? The alteration is as below.

I wrote a PictureReader to read frames in iteration replace of the DefaultReader:

class PictureReader(Reader):
    def forward(self, video_path, start, end, fps, df_row, **kwargs):
        del df_row
        start = int(max(start * fps, 1))
        end = int(end * fps)
        if end <= start: return torch.Tensor()

        video_path = video_path[:-4]
        video = []
        for i in range(start, end):
            picture = torchvision.io.read_image(video_path+'/frame_{:010d}.jpg'.format(i))
            video.append(picture.permute(1,2,0))
        return torch.stack(video), {}, {}

As for the get_frame_rate(), I got fps from the annotation df. So I add a line in _init_ of class EPICKitchens. df['fps'] = df['start_frame'] / df['start']

rohitgirdhar commented 2 years ago

Hi @zhoumumu Yes the evaluation seems right. You can also just look at the tensor board where all the numbers are logged.

To debug this, can you try testing the provided model with your frame reader? That might help isolate where the problem might be.

zhoumumu commented 2 years ago

Thanks for your advice. I tested 2 provided models. The model from 'expts/03_ek100_avt_tsn_obj.txt' gave me the exact rec@5 result as yours, 8.7. But the model from 'expts/01_ek100_avt.txt' gave me 14.6(acc1:13.56, acc5: 32.76), which is slightly different from your 14.9. Moreover, I tested their fusion result, which gave me 15.4.

It seems like my PictureReader read indeed different info from the default one. And it seems there's no big problem in my reader since I can get acceptable results with provided models. But why??? I think they are literally functionally equal, what do you think?

It seems I can't verify the experiment with only frames. The videos are just too slow to download, the download speed here is about 0.3M/s Orz.

rohitgirdhar commented 2 years ago

Hi, apologies for the delay in responding. I'm guessing the data preprocessing or the loader is causing the issue in that case. The test performance should ideally be close to 14.9. As this issue also observed, any mistakes in preprocessing can make a big difference. Ideally if you can get the videos, that might be the easiest to debug this.

zhoumumu commented 2 years ago

OK, once I made any progress, I'll reopen this issue.

zerodecoder1 commented 2 years ago

Hi @zhoumumu,

I was having some trouble with this as well. Were you able to fix the issue?

zhoumumu commented 2 years ago

No,the videos set is just too big to download, and requires too much computing resource. So I gave up.

------------------ 原始邮件 ------------------ 发件人: "facebookresearch/AVT" @.>; 发送时间: 2021年12月28日(星期二) 中午11:39 @.>; @.**@.>; 主题: Re: [facebookresearch/AVT] Unable to reproduce '01_ek100_avt.txt' val result (Issue #23)

Hi @zhoumumu,

I was having some trouble with this as well. Were you able to fix the issue?

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.Message ID: @.***>

zerodecoder1 commented 2 years ago

Same, i have to train without end-to-end due to compute. Were you trying this by just training the head model and not the base?