Closed leexinhao closed 1 year ago
@leexinhao Thanks for your attention to COSA.
the 'bert-base-uncased-crossattn' add cross-attention layers based on 'bert-base-uncased'. The parameters of cross-attention layers are randomly initialized while other parameters are exactly the same as 'bert-base-uncased'. The preprocess is shown in the comments of following code.
You can also read original 'bert-base-uncased' but with new config with corss-attention layers.
Thank you!
Thanks for your nice work! I notice that the
bert-base-uncased-crossattn
in your code seem to be different from the ones everyone uses (bert-base-uncased
), and I find that the weight of cross attention is in your pytorch.bin, which we usually initialize it randomly.