CaptainEven / MCMOT

Real time one-stage multi-class & multi-object tracking based on anchor-free detection and ReID
MIT License
383 stars 82 forks source link

2 out of 8 backbones throw size errors #47

Closed austinmw closed 3 years ago

austinmw commented 3 years ago

Hi, apologies for creating multiple issues, but really liking this model!

At a default resolution of 1280x736 I'm able to train the following backbones without error:

However resdcn_50 and hrnet_32 both throw sizing errors:

resdcn_50:

RuntimeError: The size of tensor a (216) must match the size of tensor b (215) at non-singleton dimension 3

hrnet_32:

RuntimeError: Given transposed=1, weight of size 36 36 2 2, expected input[4, 64, 68, 120] to have 36 channels, but got 64 channels instead

Do you happen to recognize what the problem might be for either of these? Not sure if it's simply an unallowable input size or if network modifications are required.


Also I'm curious if you have a backbone recommendation for best detecting very small objects (areas of ~50 pixels) which also have high counts and are very clustered together (tinyperson crowd counting)? And is there value in upsampling my 1280x720 resolution videos to something like 1920x1088 to assist small obj detection?

Really appreciate any advice you have time to offer!

CaptainEven commented 3 years ago

@austinmw I think a higher resolution will help to detect small objects, but make sure the resolution will be integer-divisible by 4 which is the pre-set down-sampling factor for the feature map used to do ReID.