Question about the model's complexity

As i understand from the paper, Deformable DETR doesn't suffer from quadratic complexity like DETR. The complexity is [2NqC^2+min(HWC^2, NqKC^2)] so as long as NqK < HW, the model's complexity should be the same even if we change the size of feature map H*W to higher res?

I checked the log and model provided for Deformable DETR (single scale) and Deformable DETR, the n_parameters and sizes are pretty close: 33844193 (398MB) and 39847265 (468MB). Im assuming the difference is because the parameters of conv layers for multiscale are included.

So I tried traning the model on a face detection dataset (WIDER_FACE), the standard model gave the exact same n_parameters and size as the provided log, but when i change the backbone's feature map to higher res: layer2 -> layer1, it runs out of memory during training (im on colab, 12gb RAM).

So is my understanding correct, or does feature map res affect complexity?

Also just curious, but why did you guys choose the lowest res feature map (layer4) as default for single scale in the code?

fundamentalvision / Deformable-DETR

Question about the model's complexity #176