JialianW / TraDeS

Track to Detect and Segment: An Online Multi-Object Tracker (CVPR 2021)
MIT License
553 stars 108 forks source link

Post processing removes all detections #26

Open rvrsprdx opened 3 years ago

rvrsprdx commented 3 years ago

Hi,

thanks for your work. I was wondering why I didn't get any detections on a video when running the demo.py . I've then realized that the bboxes are empty [] after post_process() is invoked. In fact result is empty as well. result = self.post_process(dets, meta, scale, output) in Line 141 detector.py However, before that line the detections dets contain reasonable bboxes for the frame. Can you tell me why that is and maybe how to solve this? I'd like to save these detections in a text file. I can access the dets but I'm not sure how to access the corresponding tracking id.

Thanks.

JialianW commented 3 years ago

Can you check how it becomes empty in this file? https://github.com/JialianW/TraDeS/blob/3eafd249ca0f18af8000d5798d4c552a0bd627ec/src/lib/utils/post_process.py

rvrsprdx commented 3 years ago

Thanks for your quick reply.

While trying to find the issue I found out that the output of dets["bboxes"] before post_processing changes with identical demo run configurations. Sometimes I get some negative numbers as bbox coordinates. Other times, I get very realistic bboxes for all the frames. And sometimes, I get a mix of realistic and unrealistic bboxes for frames.

CUDA_VISIBLE_DEVICES=0 python demo.py tracking --load_model ../checkpoints/trades_epoch400_std_params.pth \ --demo ../videos/myvideo.avi \ --pre_hm --ltrb_amodal --pre_thresh 0.5 --track_thresh 0.1 --inference \ --clip_len 14 \ --trades \ --save_video \ --resize_video \ --input_h 544 --input_w 960 \ Again, I've run this exact command 10 times and I get different values for det["bboxes"] each time. To come back to my initial problem: Even when getting realistic bboxes, I still get empty bboxes after post_processing.

Another question regarding video resolution: The input_h and input_w are set to the network size I have used to train the model. The video however has a resolution of w=724, h=708. I cannot set the input_h and input_w to these values, I get the following error: RuntimeError: The size of tensor a (91) must match the size of tensor b (90) at non-singleton dimension 3

JialianW commented 3 years ago

Have u ever tried the provided demo with provided trained model? Is there any problem for that demo? I haven't encountered this issue before.

For your second question, the resolution values need to be evenly divided by 32.

rvrsprdx commented 3 years ago

Yes, the demo works without any problems on the provided mot model and video.

I had trained on a custom (vehicle) dataset without using a pretrained model, in particular not crowdhuman. There is no issue with that, is it? I get many missing weights warnings when the model is loaded.

I have to add that when negative values for the bboxes appear, they're very small (close to zero). Also, I have trained Center Tack successfully on the same dataset. I think the problem lies with the trained model...do you have any idea on that?

Thanks.

JialianW commented 3 years ago

It is not a problem for not using a pretrained model or the warnings.

If you have reasonable results before post-processing, it looks like not a problem from trained model. Did u check the usage of 'ltrb_amodal' and keep consistent when training and testing? It is related to the boxes outside images.

rvrsprdx commented 3 years ago

Thanks for your reply.

I didn't change the ltrb_amodal value during training and testing so it should be the same.

Can you explain what ltrb_amodal and ltrb mean?

Thanks.

rvrsprdx commented 3 years ago

I kinda solved this issue now.

I still don't get any bboxes shown on the video but results now contains everything, including the bboxes.

This makes sense because there seem to be reasonable detections made but "just" the tracking seems to fail completely. I have trained on custom images so that's ok. Running the inference several times with the same configuration on a video I get a MOTP ranging between 60-62. Is this kind of variance normal? Also, the number of detections seem to be capped at 100 detections per video (which might answer the questions in the last sentence). I don't think that's supposed to be. Where can I change this?

HELOBILLY commented 1 year ago

hi, @rvrsprdx, I saw your discussion under this issue, which is very similar to my task, i.e.: training with static images and testing on image sequences. There are two categories being tracked, so my training script is:

python main.py tracking --exp_id my-exp --load_model ../models/crowdhuman_pretrained.pth --dataset custom --custom_dataset_ann_path my_train.json --custom_dataset_img_path my_images --input_h 1024 --input_w 1024 --num_classes 2 --pre_hm --ltrb_amodal --shift 0.05 --scale 0.05 --same_aug --hm_disturb 0.05 --lost_disturb 0.4 --fp_disturb 0.1 --num_epochs 30 --lr_step 15,25 --save_point 20,25 --gpus 0 --batch_size 4 --num_workers 8

I use the provided crowdhuman_pretrained.pth as pre-training weights. After training, I use the following script for prediction:

python demo.py tracking --load_model ../exp/tracking/my-exp/model_last.pth --demo ../data/test/video-1/img1 --input_h 1600 --input_w 1920 --save_results --num_class 2 --pre_hm --ltrb_amodal --pre_thresh 0.5 --track_thresh 0 --inference --clip_len 2 --trades

I comment out the Line 177/178 in tracker.py

for r in ret: #del r['embedding']

and set --track_thresh 0 when running the demo.py

However, the results still look bad, and the scores are quite low. Can you help me by pointing out some suggestions? I'd appreciate it!