For example, I wonder if I train an LLM model using one LanguageBind/LanguageBind_Video_FT, LanguageBind/LanguageBind_Video, or LanguageBind/LanguageBind_Video_V1.5_FT. Can I later swap the Video Encoder for one of the other ones? Or would I need to retrain said LLM with a different encoder if I wish to swap the encoder? Should these give approximately similar results?
For example, I wonder if I train an LLM model using one LanguageBind/LanguageBind_Video_FT, LanguageBind/LanguageBind_Video, or LanguageBind/LanguageBind_Video_V1.5_FT. Can I later swap the Video Encoder for one of the other ones? Or would I need to retrain said LLM with a different encoder if I wish to swap the encoder? Should these give approximately similar results?