Open utterances-bot opened 1 year ago
I just read your paper, and your results look great. Your visual expert approach seems like a general-purpose technique for combining multiple modalities. It seems like there's an emerging trend to combine mixtures of experts. I know that you don't call your approach MoE, but it seems similar in concept. Do you see this trend continuing into the future as fertile ground for further research? It seems highly relevant to the fields of explain ability and as a way to get better performance out of increasingly complex systems going forward.
Paper Review: CogVLM: Visual Expert for Pretrained Language Models – Andrey Lukyanenko
My review of the paper CogVLM Visual Expert for Pretrained Language Models
https://andlukyane.com/blog/paper-review-cogvlm