Open hamza-dev-12 opened 1 month ago
To replace the visual model within the InternVL2 framework, it is necessary to modify the self.vision_model attribute within the models_internvl_chat.py script. Subsequent to this modification, retraining of the projection layer is required. At present, there is no automated or simplified method available for this model substitution.
To replace the visual model within the InternVL2 framework, it is necessary to modify the self.vision_model attribute within the models_internvl_chat.py script. Subsequent to this modification, retraining of the projection layer is required. At present, there is no automated or simplified method available for this model substitution.
Hello, could you please tell me how to retrain the MLP? Which commands and files do I need to run?
Motivation
Task specific vision-models would perform better in that task rather than general purpose vision-model. So it would be better if we can simple pass our vision_model in the internvl model and it extracts the configuration and adjust every thing dynamically.
it would be great if we can integrate our custom vision-model with much changes. Still for now can anyone tell what changes would be required in order to integrate custom vision model.
Related resources
Currently I am not really aware of that but I think there is VisionEncoderDecoder library provided by hugging face which integrates Vit and LLM.
Additional context
No response