about the visual_encoder

Event-AHU / OpenPAR

[OpenPAR] An open-source framework for Pedestrian Attribute Recognition, based on PyTorch

MIT License

85 stars 11 forks source link

about the visual_encoder #4

Closed Dinosaurcubs closed 11 months ago

Dinosaurcubs commented 11 months ago

Ncie work, congratulations! I got a question recently, it seems you removed the last ln_post layer before the projection layer in the visual_encoder， What is the reason for doing this?

1125178969 commented 11 months ago

We will feed the visual features of CLIP to the multimodal encoder for feature fusion, so we think that the LN here is not necessary, but experiments show that with or without this LN has no effect on the results.

Dinosaurcubs commented 11 months ago

got it, thanks!