AILab-CVC / UniRepLKNet

[CVPR'24] UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
https://arxiv.org/abs/2311.15599
Apache License 2.0
863 stars 52 forks source link

input_size 384*128 tends to be nan #14

Closed Zoesxw closed 3 months ago

Zoesxw commented 4 months ago

Thank you for your excellent work! when I use UniRepLKNet in my work (person reid), input_size of which is usually 384x128, it tends to be nan after stages3. I think maybe because feature_map after the fifth downsample (downsample3) is too small (12x4), and kernel size in stages3 is 13x13. Can you give me some possible resons and suggestions about it? Thank you so much.

invictus717 commented 4 months ago

Thanks for your interest in UniRepLKNet! Your proposed issue is super constructive! Exactly, when we conducted transferring experiments to wilder fields using the UniRepLKNet pretrained on the ImageNet-1K or ImageNet-22k, we found a similar phenomenon of NaN features. We think that it may be a huge distribution discrepancy (resulting in pattern inconsistency) between target data and pretrained data. The problem may lie in the BN layers of the Lark block. The design seems to be relevant to the ability to transfer the large-kernel ConvNets to other downstream tasks.

Zoesxw commented 3 months ago

Thanks for your reply! I found some values in the feature_map after stage3 are very large (BN layers of the Lark block as you say), results nan in downsample4, and LN is before conv in downsample of ConvNeXt. So I add an extra LN before the downsample4, it tends to train normally.

invictus717 commented 3 months ago

If there is any question, please feel free to reach out!