Closed phellonchen closed 9 months ago
Hi! Thanks for preparing and open-sourcing all of these datasets. I noticed that you use the LA images in MIMIC-IT. May I know wether you applied in-context tuning when training the model or the LA images were just used in ordinary QA format?
Could you share the sampled images for Moe training, i.e. the images for Stage II SViT-157k, LVIS-220k LRV-331k, MIMIC-IT-256k