Modalities / modalities

A framework for training multimodal foundation models.
MIT License
38 stars 3 forks source link

Limited and potentially incorrect weight initialization for CoCa model #165

Closed flxst closed 1 week ago

flxst commented 2 weeks ago

In #161, the weight initialization was restructured and extended, mainly for the GPT2 model.

For CoCa, there are

Note: This issue should be addressed after #161 was merged.

flxst commented 1 week ago

Update: Only plain initialization with an explicitly specified standard deviation (e.g. 0.02, not auto) is now allowed / implemented for CoCa, see #161.