Hi, could you provide a detailed log of the multi-scale training(R50_C5)?

x-x110 commented 3 years ago

chensnathan commented 3 years ago

Hi, sorry for the late reply.

I don't have log files for multi-scale training now. You can train it for yourself. By following the settings in the paper, you can achieve comparable results with those reported in the paper.

x-x110 commented 3 years ago

Hi, I want to run a demo using YOLOF. And, I wrote these files. However, I can't recognize any objects in the image. Could you get me some advances?

------------------ 原始邮件 ------------------ 发件人: "chensnathan/YOLOF" @.>; 发送时间: 2021年7月2日(星期五) 中午12:14 @.>; @.**@.>; 主题: Re: [chensnathan/YOLOF] Hi, could you provide a detailed log of the multi-scale training(R50_C5)? (#27)

Hi, sorry for the late reply.

I don't have log files for multi-scale training now. You can train it for yourself. By following the settings in the paper, you can achieve comparable results with those reported in the paper.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

chensnathan commented 3 years ago

I don't understand that what files did you write. Could you provide more details about how you run a demo with YOLOF?

x-x110 commented 3 years ago

Hi, the logit of these files is as follows. First, following https://github.com/facebookresearch/detectron2/blob/master/GETTING_STARTED.md, we download demo.py and perdictor.py. Then, we write default params(e.g. --config-file and --input). Finally, we write the YOLOF_predictor. Compared with default, we modifiy the checkpointer and pre-processing of image（YOLOFCheckpointer，T.AugmentationList(build_augmentation(cfg, False))).

------------------ 原始邮件 ------------------ 发件人: "chensnathan/YOLOF" @.>; 发送时间: 2021年7月11日(星期天) 晚上6:46 @.>; @.**@.>; 主题: Re: [chensnathan/YOLOF] Hi, could you provide a detailed log of the multi-scale training(R50_C5)? (#27)

I don't understand that what files did you write. Could you provide more details about how you run a demo with YOLOF?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

chensnathan commented 3 years ago

For the demo, you can use the "DefaultPredictor" directly. Could you debug the output of the predictor? I will help with it when I got time this week.

x-x110 commented 3 years ago

Thanks!!!

------------------ 原始邮件 ------------------ 发件人: "chensnathan/YOLOF" @.>; 发送时间: 2021年7月12日(星期一) 晚上6:15 @.>; @.**@.>; 主题: Re: [chensnathan/YOLOF] Hi, could you provide a detailed log of the multi-scale training(R50_C5)? (#27)

For the demo, you can use the "DefaultPredictor" directly. Could you debug the output of the predictor? I will help with it when I got time this week.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

x-x110 commented 3 years ago

Hi, we use the "DefaultPredictor" and get the result. However, I get different performances. When I modify the "SCORE_THRESH_TEST" to 0.3, we can get right, meanwhile, the Map reduces to 35.49(base 37.5). Could you give me some direction?

0.3(Map 35.49)

0.05(Map 37.5)

------------------ 原始邮件 ------------------ 发件人: "chensnathan/YOLOF" @.>; 发送时间: 2021年7月12日(星期一) 晚上6:15 @.>; @.**@.>; 主题: Re: [chensnathan/YOLOF] Hi, could you provide a detailed log of the multi-scale training(R50_C5)? (#27)

For the demo, you can use the "DefaultPredictor" directly. Could you debug the output of the predictor? I will help with it when I got time this week.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

chensnathan commented 3 years ago

Hi, sorry for the late reply.

It's normal that the performance drops when you set a higher threshold (e.g., 0.3). A higher threshold means that you remove several valid predictions compared with the original setting (threshold=0.05). In YOLOF, we set the threshold as 0.05 by default.

x-x110 commented 3 years ago

Hi, maybe these pictures are not shown on GitHub. Please see the email. The problem is that I obtain many low score boxes when setting the threshold to 0.05, but I can get a clear result when setting the threshold to 0.3. But, setting the threshold to 0.3, we only get 35.49 mAP and to 0.05 get 37.5. I have put these pictures in the attachment.

------------------ 原始邮件 ------------------ 发件人: "chensnathan/YOLOF" @.>; 发送时间: 2021年8月3日(星期二) 晚上8:54 @.>; @.**@.>; 主题: Re: [chensnathan/YOLOF] Hi, could you provide a detailed log of the multi-scale training(R50_C5)? (#27)

Hi, sorry for the late reply.

It's normal that the performance drops when you set a higher threshold (e.g., 0.3). A higher threshold means that you remove several valid predictions compared with the original setting (threshold=0.05). In YOLOF, we set the threshold as 0.05 by default.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

chensnathan commented 3 years ago

There exist many TPs (True Positives) between 0.05 and 0.3. Thus the mAP is lower than the original one when you set the threshold to 0.3. A detailed analysis on TPs and FPs may be helpful to understand why the performance drops.

x-x110 commented 3 years ago

Hello, when setting the threshold to 0.3, we can get the picture named 0.3.jpg(35.5mAP, true result) and to 0.05, we can get the picture named 0.05.png(37.5mAP, false result). The mAP is high but gets a false result. why?

------------------ 原始邮件 ------------------ 发件人: "chensnathan/YOLOF" @.>; 发送时间: 2021年8月4日(星期三) 晚上9:06 @.>; @.**@.>; 主题: Re: [chensnathan/YOLOF] Hi, could you provide a detailed log of the multi-scale training(R50_C5)? (#27)

There exist many TPs (True Positives) between 0.05 and 0.3. Thus the mAP is lower than the original one when you set the threshold to 0.3. A detailed analysis on TPs and FPs may be helpful to understand why the performance drops.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

chensnathan commented 3 years ago

You should check the whole validation set instead of one single image.

x-x110 commented 3 years ago

Hi, we use the coco2017 val dataset. In the attachment we submit 3 json files (improved result (our), original result (your), official result) and a simple test script. The script shows that we are able to get a good result but the detection image shows a messy frame. Why?

------------------ 原始邮件 ------------------ 发件人: "chensnathan/YOLOF" @.>; 发送时间: 2021年8月5日(星期四) 中午11:31 @.>; @.**@.>; 主题: Re: [chensnathan/YOLOF] Hi, could you provide a detailed log of the multi-scale training(R50_C5)? (#27)

You should check the whole validation set instead of one single image.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

从QQ邮箱发来的超大附件

test.zip (35.69M, 无限期)进入下载页面：http://mail.qq.com/cgi-bin/ftnExs_download?k=0b6537634c35a1ca598f3dc81033561853515453075052551b545452541e505403001a5a5751561a525153515057510254500654363b64435316434d4c5a14370b&t=exs_ftn_download&code=6e7c63d7

chensnathan commented 3 years ago

I check several visualizations. There indeed exists low score bounding boxes in some images, which may be wrong predictions. While for the performance calculation, you need to do the counting for TPs, FPs, and FNs, which is more intuitive to understand why the mAP is higher with a threshold of 0.05.

BTW, you can do a visualization with different thresholds for other detectors' results. And you can get similar visualization results.

x-x110 commented 3 years ago

Hi, I want to use your data of YOLOF(R50C5) in our paper. GFLOPs are 86 in your paper, but I got 85 when evalution the GFLOPs in Detectron2. Could you give me some adviance?

[32m[08/23 20:06:35 d2.data.datasets.coco]: [0mLoaded 5000 images in COCO format from /mnt/disk2/dataset/coco/annotations/instances_val2017.json [32m[08/23 20:06:35 d2.data.build]: [0mDistribution of instances among all 80 categories: [36m	category	#instances	category	#instances	category	#instances
person	10777	bicycle	314	car	1918
motorcycle	367	airplane	143	bus	283
train	190	truck	414	boat	424
traffic light	634	fire hydrant	101	stop sign	75
parking meter	60	bench	411	bird	427
cat	202	dog	218	horse	272
sheep	354	cow	372	elephant	252
bear	71	zebra	266	giraffe	232
backpack	371	umbrella	407	handbag	540
tie	252	suitcase	299	frisbee	115
skis	241	snowboard	69	sports ball	260
kite	327	baseball bat	145	baseball gl..	148
skateboard	179	surfboard	267	tennis racket	225
bottle	1013	wine glass	341	cup	895
fork	215	knife	325	spoon	253
bowl	623	banana	370	apple	236
sandwich	177	orange	285	broccoli	312
carrot	365	hot dog	125	pizza	284
donut	328	cake	310	chair	1771
couch	261	potted plant	342	bed	163
dining table	695	toilet	179	tv	288
laptop	231	mouse	106	remote	283
keyboard	153	cell phone	262	microwave	55
oven	143	toaster	9	sink	225
refrigerator	126	book	1129	clock	267
vase	274	scissors	36	teddy bear	190
hair drier	11	toothbrush	57
total	36335					[0m

[32m[08/23 20:06:35 d2.data.dataset_mapper]: [0m[DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(800, 800), max_size=1333, sample_style='choice')] [32m[08/23 20:06:35 d2.data.common]: [0mSerializing 5000 elements to byte tensors and concatenating them all ... [32m[08/23 20:06:35 d2.data.common]: [0mSerialized dataset takes 19.15 MiB [32m[08/23 20:06:39 fvcore.common.checkpoint]: [0mLoading checkpoint from ../weights/YOLOF_R50_C5_1x.pth [32m[08/23 20:06:39 fvcore.common.checkpoint]: [0mThe checkpoint state_dict contains keys that are not used by the model: [35manchor_generator.cell_anchors.0[0m [5m[31mWARNING[0m [32m[08/23 20:06:39 fvcore.nn.jit_analysis]: [0mUnsupported operator aten::log encountered 1 time(s) [5m[31mWARNING[0m [32m[08/23 20:06:39 fvcore.nn.jit_analysis]: [0mThe following submodules of the model were never called during the trace of the graph. They may be unused, or they were accessed by direct calls to .forward() or via other python methods. In the latter case they will have zeros for statistics, though their statistics will still contribute to their parent calling module. model.anchor_matcher [32m[08/23 20:07:07 detectron2]: [0mFlops table computed from only one input sample:	module	#parameters or shape
model	44.113M	84.517G
backbone	23.455M	66.945G
backbone.stem.conv1	9.408K	2.078G
backbone.stem.conv1.weight	(64, 3, 7, 7)
backbone.stem.conv1.norm		68.352M
backbone.res2	0.213M	11.75G
backbone.res2.0	73.728K	4.108G
backbone.res2.1	69.632K	3.821G
backbone.res2.2	69.632K	3.821G
backbone.res3	1.212M	16.487G
backbone.res3.0	0.377M	5.135G
backbone.res3.1	0.279M	3.784G
backbone.res3.2	0.279M	3.784G
backbone.res3.3	0.279M	3.784G
backbone.res4	7.078M	23.882G
backbone.res4.0	1.507M	5.092G
backbone.res4.1	1.114M	3.758G
backbone.res4.2	1.114M	3.758G
backbone.res4.3	1.114M	3.758G
backbone.res4.4	1.114M	3.758G
backbone.res4.5	1.114M	3.758G
backbone.res5	14.942M	12.749G
backbone.res5.0	6.029M	5.147G
backbone.res5.1	4.456M	3.801G
backbone.res5.2	4.456M	3.801G
encoder	4.534M	3.861G
encoder.lateral_conv	1.049M	0.891G
encoder.lateral_conv.weight	(512, 2048, 1, 1)
encoder.lateral_conv.bias	(512,)
encoder.lateral_norm	1.024K	2.176M
encoder.lateral_norm.weight	(512,)
encoder.lateral_norm.bias	(512,)
encoder.fpn_conv	2.36M	2.005G
encoder.fpn_conv.weight	(512, 512, 3, 3)
encoder.fpn_conv.bias	(512,)
encoder.fpn_norm	1.024K	2.176M
encoder.fpn_norm.weight	(512,)
encoder.fpn_norm.bias	(512,)
encoder.dilated_encoder_blocks	1.123M	0.96G
encoder.dilated_encoder_blocks.0	0.281M	0.24G
encoder.dilated_encoder_blocks.1	0.281M	0.24G
encoder.dilated_encoder_blocks.2	0.281M	0.24G
encoder.dilated_encoder_blocks.3	0.281M	0.24G
decoder	16.124M	13.71G
decoder.cls_subnet	4.722M	4.015G
decoder.cls_subnet.0	2.36M	2.005G
decoder.cls_subnet.1	1.024K	2.176M
decoder.cls_subnet.3	2.36M	2.005G
decoder.cls_subnet.4	1.024K	2.176M
decoder.bbox_subnet	9.443M	8.03G
decoder.bbox_subnet.0	2.36M	2.005G
decoder.bbox_subnet.1	1.024K	2.176M
decoder.bbox_subnet.3	2.36M	2.005G
decoder.bbox_subnet.4	1.024K	2.176M
decoder.bbox_subnet.6	2.36M	2.005G
decoder.bbox_subnet.7	1.024K	2.176M
decoder.bbox_subnet.9	2.36M	2.005G
decoder.bbox_subnet.10	1.024K	2.176M
decoder.cls_score	1.844M	1.567G
decoder.cls_score.weight	(400, 512, 3, 3)
decoder.cls_score.bias	(400,)
decoder.bbox_pred	92.18K	78.336M
decoder.bbox_pred.weight	(20, 512, 3, 3)
decoder.bbox_pred.bias	(20,)
decoder.object_pred	23.045K	19.584M
decoder.object_pred.weight	(5, 512, 3, 3)
decoder.object_pred.bias	(5,)

[32m[08/23 20:07:07 detectron2]: [0mAverage GFlops for each type of operators: [('conv', 86.96688461248), ('batch_norm', 0.9727329696)] [32m[08/23 20:07:07 detectron2]: [0mTotal GFlops: 87.9±9.7

------------------ 原始邮件 ------------------ 发件人: "chensnathan/YOLOF" @.>; 发送时间: 2021年8月5日(星期四) 下午5:33 @.>; @.**@.>; 主题: Re: [chensnathan/YOLOF] Hi, could you provide a detailed log of the multi-scale training(R50_C5)? (#27)

I check several visualizations. There indeed exists low score bounding boxes in some images, which may be wrong predictions. While for the performance calculation, you need to do the counting for TPs, FPs, and FNs, which is more intuitive to understand why the mAP is higher with a threshold of 0.05.

BTW, you can do a visualization with different thresholds for other detectors' results. And you can get similar visualization results.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

chensnathan commented 3 years ago

For flops calculation, we follow the steps of DETR. You can check here.

x-x110 commented 1 year ago

Sorry to bother you. The purpose of this letter is to inquire about the configuration or weight of multi-scale training (R50 or R101). During the past two years, we was committed to solving YOLOF's NMS problem. Recently we successfully implemented the YOLOF version of NMS-Free without any additional parameters (37.1 mAP vs 37.7 mAP). But because there is no weight of multi-scale training, we can not carry out multi-scale training. After following the Settings in the paper, we can only get ~40 maps. We hope that you can provide us with a weight of multi-scale training to complete our final experiment.

------------------ 原始邮件 ------------------ 发件人: "Xx" @.>; 发送时间: 2021年8月23日(星期一) 晚上10:00 @.>;

主题: 回复： [chensnathan/YOLOF] Hi, could you provide a detailed log of the multi-scale training(R50_C5)? (#27)

Hi, I want to use your data of YOLOF(R50C5) in our paper. GFLOPs are 86 in your paper, but I got 85 when evalution the GFLOPs in Detectron2. Could you give me some adviance?

[32m[08/23 20:06:35 d2.data.datasets.coco]: [0mLoaded 5000 images in COCO format from /mnt/disk2/dataset/coco/annotations/instances_val2017.json [32m[08/23 20:06:35 d2.data.build]: [0mDistribution of instances among all 80 categories: [36m	category	#instances	category	#instances	category	#instances
person	10777	bicycle	314	car	1918
motorcycle	367	airplane	143	bus	283
train	190	truck	414	boat	424
traffic light	634	fire hydrant	101	stop sign	75
parking meter	60	bench	411	bird	427
cat	202	dog	218	horse	272
sheep	354	cow	372	elephant	252
bear	71	zebra	266	giraffe	232
backpack	371	umbrella	407	handbag	540
tie	252	suitcase	299	frisbee	115
skis	241	snowboard	69	sports ball	260
kite	327	baseball bat	145	baseball gl..	148
skateboard	179	surfboard	267	tennis racket	225
bottle	1013	wine glass	341	cup	895
fork	215	knife	325	spoon	253
bowl	623	banana	370	apple	236
sandwich	177	orange	285	broccoli	312
carrot	365	hot dog	125	pizza	284
donut	328	cake	310	chair	1771
couch	261	potted plant	342	bed	163
dining table	695	toilet	179	tv	288
laptop	231	mouse	106	remote	283
keyboard	153	cell phone	262	microwave	55
oven	143	toaster	9	sink	225
refrigerator	126	book	1129	clock	267
vase	274	scissors	36	teddy bear	190
hair drier	11	toothbrush	57
total	36335					[0m

[32m[08/23 20:06:35 d2.data.dataset_mapper]: [0m[DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(800, 800), max_size=1333, sample_style='choice')] [32m[08/23 20:06:35 d2.data.common]: [0mSerializing 5000 elements to byte tensors and concatenating them all ... [32m[08/23 20:06:35 d2.data.common]: [0mSerialized dataset takes 19.15 MiB [32m[08/23 20:06:39 fvcore.common.checkpoint]: [0mLoading checkpoint from ../weights/YOLOF_R50_C5_1x.pth [32m[08/23 20:06:39 fvcore.common.checkpoint]: [0mThe checkpoint state_dict contains keys that are not used by the model: [35manchor_generator.cell_anchors.0[0m [5m[31mWARNING[0m [32m[08/23 20:06:39 fvcore.nn.jit_analysis]: [0mUnsupported operator aten::log encountered 1 time(s) [5m[31mWARNING[0m [32m[08/23 20:06:39 fvcore.nn.jit_analysis]: [0mThe following submodules of the model were never called during the trace of the graph. They may be unused, or they were accessed by direct calls to .forward() or via other python methods. In the latter case they will have zeros for statistics, though their statistics will still contribute to their parent calling module. model.anchor_matcher [32m[08/23 20:07:07 detectron2]: [0mFlops table computed from only one input sample:	module	#parameters or shape
model	44.113M	84.517G
backbone	23.455M	66.945G
backbone.stem.conv1	9.408K	2.078G
backbone.stem.conv1.weight	(64, 3, 7, 7)
backbone.stem.conv1.norm		68.352M
backbone.res2	0.213M	11.75G
backbone.res2.0	73.728K	4.108G
backbone.res2.1	69.632K	3.821G
backbone.res2.2	69.632K	3.821G
backbone.res3	1.212M	16.487G
backbone.res3.0	0.377M	5.135G
backbone.res3.1	0.279M	3.784G
backbone.res3.2	0.279M	3.784G
backbone.res3.3	0.279M	3.784G
backbone.res4	7.078M	23.882G
backbone.res4.0	1.507M	5.092G
backbone.res4.1	1.114M	3.758G
backbone.res4.2	1.114M	3.758G
backbone.res4.3	1.114M	3.758G
backbone.res4.4	1.114M	3.758G
backbone.res4.5	1.114M	3.758G
backbone.res5	14.942M	12.749G
backbone.res5.0	6.029M	5.147G
backbone.res5.1	4.456M	3.801G
backbone.res5.2	4.456M	3.801G
encoder	4.534M	3.861G
encoder.lateral_conv	1.049M	0.891G
encoder.lateral_conv.weight	(512, 2048, 1, 1)
encoder.lateral_conv.bias	(512,)
encoder.lateral_norm	1.024K	2.176M
encoder.lateral_norm.weight	(512,)
encoder.lateral_norm.bias	(512,)
encoder.fpn_conv	2.36M	2.005G
encoder.fpn_conv.weight	(512, 512, 3, 3)
encoder.fpn_conv.bias	(512,)
encoder.fpn_norm	1.024K	2.176M
encoder.fpn_norm.weight	(512,)
encoder.fpn_norm.bias	(512,)
encoder.dilated_encoder_blocks	1.123M	0.96G
encoder.dilated_encoder_blocks.0	0.281M	0.24G
encoder.dilated_encoder_blocks.1	0.281M	0.24G
encoder.dilated_encoder_blocks.2	0.281M	0.24G
encoder.dilated_encoder_blocks.3	0.281M	0.24G
decoder	16.124M	13.71G
decoder.cls_subnet	4.722M	4.015G
decoder.cls_subnet.0	2.36M	2.005G
decoder.cls_subnet.1	1.024K	2.176M
decoder.cls_subnet.3	2.36M	2.005G
decoder.cls_subnet.4	1.024K	2.176M
decoder.bbox_subnet	9.443M	8.03G
decoder.bbox_subnet.0	2.36M	2.005G
decoder.bbox_subnet.1	1.024K	2.176M
decoder.bbox_subnet.3	2.36M	2.005G
decoder.bbox_subnet.4	1.024K	2.176M
decoder.bbox_subnet.6	2.36M	2.005G
decoder.bbox_subnet.7	1.024K	2.176M
decoder.bbox_subnet.9	2.36M	2.005G
decoder.bbox_subnet.10	1.024K	2.176M
decoder.cls_score	1.844M	1.567G
decoder.cls_score.weight	(400, 512, 3, 3)
decoder.cls_score.bias	(400,)
decoder.bbox_pred	92.18K	78.336M
decoder.bbox_pred.weight	(20, 512, 3, 3)
decoder.bbox_pred.bias	(20,)
decoder.object_pred	23.045K	19.584M
decoder.object_pred.weight	(5, 512, 3, 3)
decoder.object_pred.bias	(5,)

[32m[08/23 20:07:07 detectron2]: [0mAverage GFlops for each type of operators: [('conv', 86.96688461248), ('batch_norm', 0.9727329696)] [32m[08/23 20:07:07 detectron2]: [0mTotal GFlops: 87.9±9.7

------------------ 原始邮件 ------------------ 发件人: "chensnathan/YOLOF" @.>; 发送时间: 2021年8月5日(星期四) 下午5:33 @.>; @.**@.>; 主题: Re: [chensnathan/YOLOF] Hi, could you provide a detailed log of the multi-scale training(R50_C5)? (#27)

I check several visualizations. There indeed exists low score bounding boxes in some images, which may be wrong predictions. While for the performance calculation, you need to do the counting for TPs, FPs, and FNs, which is more intuitive to understand why the mAP is higher with a threshold of 0.05.

BTW, you can do a visualization with different thresholds for other detectors' results. And you can get similar visualization results.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

chensnathan / YOLOF

Hi, could you provide a detailed log of the multi-scale training(R50_C5)? #27