Open lynquantumman opened 9 months ago
Thank you for the reminder! We have supplemented the cross-attention part and the vision encoder part on both modelscope and huggingface. We have also added a script in the code that merges them into a single model and illustrated it in the README. We greatly appreciate your support for CodeFuse-VLM!
The model you provide in modelscope and huggingface only include the LLM model. The cross attention part and visual part is missing. Ergo, based on the ckpt, we cannot re-implement your exps. Hope you can make it complete. Plus, if it is what it expected to be on the png image, it should be a greate work.