Open xesdiny opened 1 month ago
They are all based on Qwen2, so why not compare Qwen2-VL on the benchmark to compare the biggest change, the Encoder (patchify embedding -> CLIP)?
Thanks for your advice. We will compare Qwen2-VL on more benchmarks in the following reports
They are all based on Qwen2, so why not compare Qwen2-VL on the benchmark to compare the biggest change, the Encoder (patchify embedding -> CLIP)?