atelili / 2BiVQA

2BiVQA is a no-reference deep learning based video quality assessment metric.
MIT License
32 stars 6 forks source link

What was the best score of the size nb_frames and overlap? #14

Open circlebig opened 2 months ago

circlebig commented 2 months ago

Hi, may you tell me the number of overlap and nb_frames?

Ahmed-Telili commented 1 month ago

Hello, the number of frame was set to 30 and the overlap to 0.2

circlebig commented 1 month ago

Thank you very much. May I ask some questions?

First, did you extract and train all the features of Konvid and then test on test patch features?

Second, if I set nb_frames 30 and overlap 0.2, the shape of feature is (2880, 2560) (backbone: mnasnet) 2880 might be len of video(240)* patches. Did your computer work well in training that features? My computer spec is i9-12900k, rtx3090, ram 78GiB. It seems 2880 is too big to train.

Ahmed-Telili commented 1 month ago

Hi @circlebig, we used a pretraining backbone to extract feature from dataset, we used 25 as number of patches. So the shape of the features will be (30, 25, 2560), to avoid resource limitation, you can process the dataset image by image for features extraction.

circlebig commented 1 month ago

Thank you. Was your number of features same as number of videos?

Ahmed-Telili commented 1 month ago

Welcome ;) The number of features depends solely on the backbone. For MnasNet, each video will have features of shape (30, 25, 2560)

circlebig commented 1 month ago

Oh, I mean the number of features. For example, there are 1200 videos in Konvid data. So my number of features is 1200. Is is same as yours? 1200

Ahmed-Telili commented 1 month ago

It does not depend on the number of videos. For each video, you will get a tensor of shape (30, 25, 2560) representing its features. If you will compute the features of the Konvid dataset (1200 videos), you will get 1200 tensors.

circlebig commented 1 month ago

1200 tensors right. Thank you .