Add New LLM Backbones - Githubissues

Adds Llama-2 Chat, Mistral v0.1, Mistral v0.1 Instruct, Phi-2 LLMs. Note that these model configs match the structure of our paper (one-off changes on top of the One_Stage base configuration). All of these models can be improved by training with the "Prism" configuration (extra data, DINO + SigLIP backbones, etc.).

Evaluation Results:

	VQAv2	GQA	VizWiz	TextVQA (Pure/OCR)	RefCOCO+	OCID-Ref	VSR	POPE	TallyQA
LLaVa v1.5 7B (Base)	76.54	61.58	54.25	46.13 / 58.25	49.47	35.07	51.47	86.57	62.06
Llama-2 Chat 7B	76.92	62.11	56.39	45.3 / 56.6	58.5	46.3	61.8	86.8	58.1
Llama-2 Chat 13B	78.0	63.60	56.43	57.2 / 58.4	62.9	44.9	71.4	86.8	58.9
Mistral v0.1 7B	77.30	63.30	55.32	44.4 / 49.3	65.1	48.8	58.5	87.1	61.7
Mistral Instruct v0.1 7B	77.13	62.71	54.35	44.1 / 50.5	64.9	48.0	57.8	87.5	64.5
Phi-2 3B	41.47	33.38	12.18	6.6 / 31.0	5.7	1.5	48.7	48.2	20.2
Llama-2 (Best LLM from Paper)	77.08	62.44	55.98	44.92 / 55.24	59.47	43.89	63.67	86.74	59.22
Prism DINOSigLIP 7B (Controlled)	79.05	64.16	59.82	51.78 / 58.69	67.85	50.56	66.28	88.28	65.07

Note that Phi-2 results are fairly poor; would be good to dig into this (maybe I did something wrong with the prompting scheme).

Hopefully, this PR also serves as a template for folks looking to add their own LLMs to Prismatic -- low-hanging fruit include adding Gemma, Llama-3, Phi-3 LLMs!

CC @shikhar-srivastava @zeyuanyin @Hannibal046 @RylanSchaeffer

Resolves #6 Resolves #25

TRI-ML / prismatic-vlms

Add New LLM Backbones #27