TXH-mercury / COSA

Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
https://arxiv.org/abs/2306.09085
MIT License
39 stars 3 forks source link

Where does the `bert-base-uncased-crossattn` come from? #2

Closed leexinhao closed 1 year ago

leexinhao commented 1 year ago

Thanks for your nice work! I notice that the bert-base-uncased-crossattn in your code seem to be different from the ones everyone uses (bert-base-uncased), and I find that the weight of cross attention is in your pytorch.bin, which we usually initialize it randomly.

TXH-mercury commented 1 year ago

@leexinhao Thanks for your attention to COSA.
the 'bert-base-uncased-crossattn' add cross-attention layers based on 'bert-base-uncased'. The parameters of cross-attention layers are randomly initialized while other parameters are exactly the same as 'bert-base-uncased'. The preprocess is shown in the comments of following code.

https://github.com/TXH-mercury/COSA/blob/18f42d67d8fcec0f08e4d3e4741548d72781e65a/model/modeling.py#L376C7-L384C1

You can also read original 'bert-base-uncased' but with new config with corss-attention layers.

leexinhao commented 1 year ago

Thank you!