Hi, I am Dong Chen, a PHD. student in China. DETR is a very famous and important article that plays a fundamental role in the industry. But there was a huge controversy surrounding this article. There is a big controversial issues in my team and my reviewing.
In my paper, the question I want to explore is why the application of transformer block has a negative impact on the performance of small object detection compared to CNN layers. So I conducted multiple experiments (increasing or decreasing the level of CNN or the number of transformer blocks in DETR). I want to try to explain that CNN and Transformer have different feature extraction mechanisms for object detection.
But in this reviewing, a reviewer give me a new idea from the words in yellow lines. The reviewer try to tell me that DETR adopts different processing schemes for targets (or feature maps) of different sizes. Like this:
Large or medium objects(feature)—> CNN layers —>detector head
Small objects(feature)—> CNN layers —> transformer blocks —>detector head
Maybe, the reviewer think CNN and Transformer have different feature extraction mechanisms, and multi-scale feature map interaction in CNN-based or Transformer-based models can improve the small object detection result. But I just want to know the influence of more or less CNN and Transformer on feature extraction mechanisms. (I even do not mention the multi-scale feature map).
We need your help to clear the problem. I am looking forward to your answer
Best wishes for you.
Chen Dong from China Shanghai
Alan Chen
alan_chen@tongji.edu.cn
Alan D.Chen Ph. D Student
Describe what you want to do, including:
what inputs you will provide, if any:
what outputs you are expecting:
NOTE:
Only general answers are provided.
If you want to ask about "why X did not work", please use the
Unexpected behaviors issue template.
About how to implement new models / new dataloader / new training logic, etc., check documentation first.
We do not answer general machine learning / computer vision questions that are not specific to DETR, such as how a model works, how to improve your training/make it converge, or what algorithm/methods can be used to achieve X.
❓ How to do something using DETR
Dear Pro. Nicolas Carion,
Hi, I am Dong Chen, a PHD. student in China. DETR is a very famous and important article that plays a fundamental role in the industry. But there was a huge controversy surrounding this article. There is a big controversial issues in my team and my reviewing.
In my paper, the question I want to explore is why the application of transformer block has a negative impact on the performance of small object detection compared to CNN layers. So I conducted multiple experiments (increasing or decreasing the level of CNN or the number of transformer blocks in DETR). I want to try to explain that CNN and Transformer have different feature extraction mechanisms for object detection.
But in this reviewing, a reviewer give me a new idea from the words in yellow lines. The reviewer try to tell me that DETR adopts different processing schemes for targets (or feature maps) of different sizes. Like this:
Large or medium objects(feature)—> CNN layers —>detector head Small objects(feature)—> CNN layers —> transformer blocks —>detector head
Maybe, the reviewer think CNN and Transformer have different feature extraction mechanisms, and multi-scale feature map interaction in CNN-based or Transformer-based models can improve the small object detection result. But I just want to know the influence of more or less CNN and Transformer on feature extraction mechanisms. (I even do not mention the multi-scale feature map).
We need your help to clear the problem. I am looking forward to your answer
Best wishes for you.
Chen Dong from China Shanghai
Alan Chen
alan_chen@tongji.edu.cn Alan D.Chen Ph. D Student
Describe what you want to do, including:
NOTE:
Only general answers are provided. If you want to ask about "why X did not work", please use the Unexpected behaviors issue template.
About how to implement new models / new dataloader / new training logic, etc., check documentation first.
We do not answer general machine learning / computer vision questions that are not specific to DETR, such as how a model works, how to improve your training/make it converge, or what algorithm/methods can be used to achieve X.