Closed kennyvoo closed 9 months ago
I used a subset of train and test videos to verify my hypothesis (image too large, too much information loss when resized). For another run, I cropped the main region (h=600, w =800) to train and test. The accuracy does increased to 83+. Beside this method, is there other way to improve the accuracy?
Thank you for your prompt response. I have made adjustments to the code based on the ShangHaiTech dataset, where each video has its own background. However, it appears that the performance is not as satisfactory as the version modified from the Avenue dataset.
Regarding your second piece of advice, I am seeking guidance on determining appropriate values. Specifically, I am interested in understanding the ideal receptive field size and offset. In a scenario where a human occupies approximately 1/4 of the image in the scene (256,256), what would be the recommended receptive field size and its corresponding offset? Should it cover the entire human size?
I attempted to train a model using the IITB-corridor dataset following a similar approach used for the Avenue dataset. However, the training results are suboptimal, achieving only around 61-62% accuracy. Can you please give me some advice on how to improve it? Is there any parameters that I need to take note of? Is the reason due to the larger size of images? (distortion caused by resizing it to 256 directly). I'm currently running another run (cropped 1 region of the image) to verify this.
self.bkg = nn.Parameter(torch.from_numpy(np.repeat(cv2.imread('./bkg_iitb.jpg', 0).copy()[:, :, None], 3, axis=-1).transpose((2, 0, 1)).astype(np.float32)/127.5 - 1))
for evaluation, I've removed
and add the following lines following ped2