[Feature] Integration Of Custom Vision-Model without altering modeling_internvl_chat.py

OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

https://internvl.readthedocs.io/en/latest/

MIT License

5.97k stars 462 forks source link

[Feature] Integration Of Custom Vision-Model without altering modeling_internvl_chat.py #592

Open hamza-dev-12 opened 1 month ago

hamza-dev-12 commented 1 month ago

Motivation

Task specific vision-models would perform better in that task rather than general purpose vision-model. So it would be better if we can simple pass our vision_model in the internvl model and it extracts the configuration and adjust every thing dynamically.

it would be great if we can integrate our custom vision-model with much changes. Still for now can anyone tell what changes would be required in order to integrate custom vision model.

Related resources

Currently I am not really aware of that but I think there is VisionEncoderDecoder library provided by hugging face which integrates Vit and LLM.

Additional context

No response

qishisuren123 commented 1 month ago

To replace the visual model within the InternVL2 framework, it is necessary to modify the self.vision_model attribute within the models_internvl_chat.py script. Subsequent to this modification, retraining of the projection layer is required. At present, there is no automated or simplified method available for this model substitution.

20191864218 commented 1 month ago

To replace the visual model within the InternVL2 framework, it is necessary to modify the self.vision_model attribute within the models_internvl_chat.py script. Subsequent to this modification, retraining of the projection layer is required. At present, there is no automated or simplified method available for this model substitution.

Hello, could you please tell me how to retrain the MLP? Which commands and files do I need to run?