encounter1997 / FP-DETR

Official Implementation of "FP-DETR: Detection Transformer Advanced by Fully Pre-training"
Apache License 2.0
60 stars 2 forks source link

About code for pretraining #4

Open volgachen opened 2 years ago

volgachen commented 2 years ago

Excuse me, do you have any plan to release codes or instructions for pretraining?

encounter1997 commented 2 years ago

Sorry that we do not have the plan to release the code for pre-training, but it can be easily implemented by replacing the model construction function in the DeiT code with our model construction function.

Hope this can help you and feel free to ask anything if you have difficulties in implementing the pre-training code.

volgachen commented 2 years ago

Thank you for your response. I guess I should modify query_shape into (1,1). Is there any other configs I should notice?

encounter1997 commented 2 years ago

Taking fp-detr-base-in1k.py for example, there are several parts that should be modified in the config, as follows:

  1. Only the model definition is needed;
  2. The self-attn and the corresponding norm in encoder2 should be removed, and the operation order should be updated.
  3. return_intermediate should be updated since deep supervision is not used during pre-training. The code in the DeiT project may also need to be changed slightly, to obtain the class token from the output sequence for loss computation.
  4. num_classes should be 1000 for ImageNet classification.
volgachen commented 2 years ago
  1. The self-attn and the corresponding norm in encoder2 should be removed, and the operation order should be updated.

I suppose the self-attn you mentioned in point 2 is actually prompt_self_attn?

encounter1997 commented 2 years ago

Yes, that's right.

volgachen commented 2 years ago

Thank you! It seems to be right now.

volgachen commented 2 years ago

I find that there is a learning rate decay for sampling_offsets in the training configuration for detection. How do you handle with sampling_offsets in the pretraining process?

encounter1997 commented 2 years ago

We did not carefully tune the learning rate for sampling_offsetsand reference_pointsduring pre-training, and simply set their learning rate the same as other parameters in the transformer encoder. Tuning the learning rate may lead to better pre-training results, but we didn't try.