Fix Wildtrack evaluation

tteepe commented 1 year ago

Hi guys,

I like your work on Multi-View Tracking, but I was confused about your results on Wildtrack. Your code showed that you evaluate tracking over all views, but Wildtrack is evaluated in the projected view / bird's eye view.

I added the groundtruth creation and the evaluation to your repo. You also didn't include the first frame of the test set in your evaluation. Currently your codebase doesn't give results for the first frame on inference.

I re-ran the evaluation (with ground truth image detection) and I got the following results:

Method	IDF1	IDP	IDR	MT	ML	FP	FN	IDs	FM	MOTA	MOTP
T-Glimpse	77.8	79.3	76.4	61	4.9	91	126	42	43	72.8	79.1
T-Glimpse Stack	81.9	81.6	82.2	65.9	4.9	114	107	21	34	74.6	78.9
ReST (\w GT BBoxes)	76.6	74.9	78.4	35	0	97	56	40	19	77.5	99.9
ReST (\w MVDeTr dets)

Unfortunately you didn't provide './datasets/Wildtrack/sequence1/output/MVDeTr_test.json'. Could you upload it so I can re-run the results for fair comparison?

Best, Torben

chengche6230 commented 1 year ago

Hi Torben,

Thanks for kind reply. I just re-ran your code, which evaluates in the bird's eye view, and used the 'gp.txt' file generated by your code. The following table is the results:

Method	IDF1	IDP	IDR	MT	ML	FP	FN	IDs	FM	MOTA	MOTP
T-Glimpse	77.8	79.3	76.4	61	4.9	91	126	42	43	72.8	79.1
T-Glimpse Stack	81.9	81.6	82.2	65.9	4.9	114	107	21	34	74.6	78.9
ReST (\w GT BBoxes)	91.3	91.2	91.4	41(100%)	0(0%)	4	2	29	2	96.3	93.3
ReST (\w MVDeTr dets)	86.7	85.0	88.4	36(87.8%)	2(4.9%)	75	37	29	10	84.9	84.1

For your second concern, yes we didn't use the first frame for evaluation because of the design of our model. In the first frame, we only do Spatial Association, so there is no ID assignment in the first frame. For convenience, we eliminated the first frame during the evaluation. It's fine to include one previous frame in training set, i.e. frame 1795 in Wildtrack, and the result will therefore include the first frame in the testing set, i.e. frame 1800. The result won't vary too much.

For the detection of MVDeTr, we'll upload it soon.

Best regards, Cheng-Che

tteepe commented 1 year ago

Hi Cheng-Che,

okay, I saw that I didn't load the reID weights but I followed the other steps in your setup guide. Now I get the following results:	Method	IDF1	IDP	IDR	MT	ML	FP	FN	IDs	FM	MOTA
T-Glimpse	77.8	79.3	76.4	61	4.9	91	126	42	43	72.8	79.1
T-Glimpse Stack	81.9	81.6	82.2	65.9	4.9	114	107	21	34	74.6	78.9
ReST (\w GT BBoxes)	85.1	81.5	89.0	39	0	96	17	29	2	83.4	92.7
ReST (\w MVDeTr dets)	80.3	75.5	85.7	34	2	167	52	29	10	71.1	83.5

How did you come to these results? I ran main.py --config_file configs/Wildtrack.yml with the checkpoints provided by you and got to results above. This is the output that I'm getting:

2023-09-23 11:10:30.067 | INFO     | src.utils.tools:evaluate:31 - 
         IDF1   IDP   IDR  Rcll   Prcn  GT  MT PT ML FP FN IDs  FM  MOTA  MOTP IDt IDa IDm
0       89.3% 90.3% 88.5% 98.2% 100.0%  38  38  0  0  0 16  29   0 95.0% 0.000  19  16   6
1       90.9% 92.0% 90.1% 98.2% 100.0%  35  35  0  0  0 15  25   0 95.2% 0.001  16  14   6
2       90.4% 91.3% 89.6% 98.3% 100.0%  40  40  0  0  0 16  28   0 95.2% 0.000  18  16   6
3       87.0% 87.7% 86.4% 98.5% 100.0%  20  20  0  0  0  4  11   0 94.5% 0.003   6   5   0
4       94.9% 96.0% 94.0% 98.0% 100.0%  36  36  0  0  0 14  14   0 96.1% 0.000  11   7   5
5       88.3% 89.2% 87.5% 98.3% 100.0%  39  39  0  0  0 16  34   0 94.7% 0.001  22  17   6
6       93.4% 94.6% 92.4% 97.9% 100.0%  25  25  0  0  0 13  12   0 95.9% 0.003   7   6   2
OVERALL 90.7% 91.7% 89.9% 98.2% 100.0% 233 233  0  0  0 94 153   0 95.2% 0.001  99  81  31
=========== Wildtrack GROUND PLANE evaluation ===========
2023-09-23 11:10:30.162 | INFO     | src.utils.tools:evaluate:57 - 
   IDF1   IDP   IDR  Rcll  Prcn GT MT PT ML FP FN IDs  FM  MOTA  MOTP IDt IDa IDm
0 85.1% 81.5% 89.0% 98.0% 89.8% 39 39  0  0 96 17  29   2 83.4% 0.073  18  17   6

Neither leaving out frame 360, nor including frame 359 is an option for testing. You want to have train and test separated. Maybe you can reverse the ID assignment from frame 361 to 360? Sine this test split is only 40 frames it will make a difference. The same applies for the other datasets.

Best, Torben

chengche6230 commented 1 year ago

Hi Torben,

Here are my logs with both gt & MVDeTr detection:

groundtruth:

2023-09-23 17:48:00.875 | INFO     | src.tracker:__init__:49 - Detection: gt
2023-09-23 17:48:00.880 | INFO     | src.tracker:load_param:73 - Load Spatial Graph param from ./logs/ckpts/Wildtrack_sequence1_SG_epoch92_train90.pth
2023-09-23 17:48:00.886 | INFO     | src.tracker:load_param:84 - Load Temporal Graph param from ./logs/ckpts/Wildtrack_sequence1_TG_epoch72_train90.pth

...

2023-09-23 17:49:11.614 | INFO     | src.tracker:test:138 - Evaluation Result:
2023-09-23 17:49:12.406 | INFO     | src.utils.tools:evaluate:31 -
         IDF1   IDP   IDR   Rcll   Prcn  GT  MT PT ML FP FN IDs  FM  MOTA  MOTP IDt IDa IDm
0       90.1% 90.3% 90.1% 100.0% 100.0%  38  38  0  0  0  0  29   0 96.7% 0.000  19  16   6
1       91.8% 92.0% 91.8% 100.0% 100.0%  35  35  0  0  0  0  25   0 96.9% 0.001  16  14   6
2       91.2% 91.3% 91.2% 100.0% 100.0%  40  40  0  0  0  0  28   0 96.9% 0.000  18  16   6
3       87.7% 87.7% 87.7% 100.0% 100.0%  20  20  0  0  0  0  11   0 95.9% 0.003   6   5   0
4       95.9% 96.0% 95.9% 100.0% 100.0%  36  36  0  0  0  0  14   0 98.0% 0.000  11   7   5
5       89.0% 89.2% 89.0% 100.0% 100.0%  39  39  0  0  0  0  34   0 96.3% 0.001  22  17   6
6       94.4% 94.6% 94.4% 100.0% 100.0%  25  25  0  0  0  0  12   0 98.0% 0.003   7   6   2
OVERALL 91.6% 91.7% 91.6% 100.0% 100.0% 233 233  0  0  0  0 153   0 97.0% 0.001  99  81  31
2023-09-23 17:49:12.520 | INFO     | src.utils.tools:evaluate:57 -
   IDF1   IDP   IDR  Rcll  Prcn GT MT PT ML FP FN IDs  FM  MOTA  MOTP IDt IDa IDm
0 91.3% 91.2% 91.4% 99.8% 99.6% 41 41  0  0  4  2  29   2 96.3% 0.067  18  17   6

MVDeTr:

2023-09-23 17:49:52.774 | INFO     | src.tracker:__init__:49 - Detection: MVDeTr
2023-09-23 17:49:52.780 | INFO     | src.tracker:load_param:73 - Load Spatial Graph param from ./logs/ckpts/Wildtrack_sequence1_SG_epoch92_train90.pth
2023-09-23 17:49:52.786 | INFO     | src.tracker:load_param:84 - Load Temporal Graph param from ./logs/ckpts/Wildtrack_sequence1_TG_epoch72_train90.pth

...

2023-09-23 17:51:03.266 | INFO     | src.tracker:test:138 - Evaluation Result:
2023-09-23 17:51:04.034 | INFO     | src.utils.tools:evaluate:31 -
         IDF1   IDP   IDR  Rcll  Prcn  GT  MT PT ML  FP  FN IDs   FM  MOTA  MOTP IDt IDa IDm
0       88.0% 88.4% 88.2% 95.7% 95.2%  38  34  4  0  42  38  30   17 87.5% 0.174  13  21   3
1       87.2% 88.0% 86.7% 93.7% 94.8%  35  32  3  0  42  51  26   27 85.4% 0.187  13  16   4
2       87.6% 88.8% 86.8% 93.6% 95.4%  40  32  4  4  41  58  28   15 85.9% 0.178  15  19   6
3       53.8% 55.0% 53.4% 61.2% 62.1%  20  11  3  6 100 104  11    3 19.8% 0.176   3   8   0
4       89.2% 89.7% 89.0% 92.7% 93.0%  36  24 11  1  49  51  15   19 83.6% 0.179  14   7   6
5       84.3% 84.5% 84.4% 93.4% 93.2%  39  32  7  0  63  61  39   30 82.3% 0.198  21  23   5
6       89.9% 89.3% 90.5% 94.3% 93.0%  25  20  5  0  42  34  10   16 85.5% 0.173   4   8   3
OVERALL 85.7% 86.2% 85.6% 92.2% 92.5% 233 185 37 11 379 397 159  127 81.6% 0.182  83 102  27
2023-09-23 17:51:04.144 | INFO     | src.utils.tools:evaluate:57 -
   IDF1   IDP   IDR  Rcll  Prcn GT MT PT ML FP FN IDs  FM  MOTA  MOTP IDt IDa IDm
0 86.7% 85.0% 88.4% 96.0% 92.3% 41 36  3  2 75 37  29  10 84.9% 0.159   8  24   3

I re-ran my code with the provided ReID & model weight and got the same results. One major difference is the FP & FN. For groundtruth detection, these two metrics should be quite low (or even 0). It might be caused by the consistency b/t gt_file and ts_file used in motmetric evaluation. In my case, we evaluated from frame 1805, both in gt_file and ts_file. In your version, gt_file might be started from 1800, but the ts_file is started from 1805. I'm not sure whether it causes such an issue. I'll keep checking.

Sincerely, Cheng-Che

chengche6230 commented 1 year ago

Hi Torben,

I just saw your latest edited comment. Appreciate your suggestion. We'll consider it and revise our code in the near future. In our original model design, we only conducted ID assignment in the Temporal Association stage and skipped the first frame thereby.

Best, Cheng-Che

chengche6230 / ReST

Fix Wildtrack evaluation #3