ganler commented 6 months ago

Framework: vLLM
Note: because this is a code understanding task (e.g., we expect the model to understand natural language), we focus on the chat/instruction-tuned model in this thread.
Code context window: 16k -- we expect the model to have at least 8k trained context size

OSS model 🤗

CodeLlama

🤔 marks models with 8~16k context trained. May need to modify the config.json.

[x] 🤔 codellama/CodeLlama-7b-Instruct-hf (2x A6000) 16k extended
[x] 🤔 codellama/CodeLlama-13b-Instruct-hf (2x A6000) 16k extended
[x] 🤔 codellama/CodeLlama-34b-Instruct-hf (2x A6000) 16k extended
❌codellama/CodeLlama-70b-Instruct-hf (8x A6000) -- max positional embedding is only 4k

DeepSeekCoder

[x] deepseek-ai/deepseek-coder-6.7b-instruct (2x A6000)
[x] deepseek-ai/deepseek-coder-33b-instruct (4x A6000)

Llama 3

[x] 🤔 meta-llama/Meta-Llama-3-8B-Instruct (2x A6000) 8k extended
[x] 🤔 meta-llama/Meta-Llama-3-70B-Instruct (6x A6000) 8k extended

CodeQwen

[x] Qwen/CodeQwen1.5-7B-Chat (2x A6000)

Qwen1.5

[x] Qwen/Qwen1.5-MoE-A2.7B
[x] Qwen/Qwen1.5-7B-Chat (2x A6000)
[x] Qwen/Qwen1.5-14B-Chat (2x A6000)
[x] Qwen/Qwen1.5-32B-Chat (4x A6000)
[x] Qwen/Qwen1.5-72B-Chat (can try 8x A6000)

CodeGemma

[x] 🤔 google/codegemma-7b-it 8k extended

Mistral

[x] mistralai/Mistral-7B-Instruct-v0.1 (2x A6000)
[x] mistralai/Mistral-7B-Instruct-v0.2 (2x A6000)
[x] mistralai/Mixtral-8x7B-Instruct-v0.1 (2x A100)
[x] mistralai/Mixtral-8x22B-Instruct-v0.1 (8x A6000)

Starcoder2

[x] bigcode/bigcode/starcoder2-instruct-15b-v0.1 extended

Private model 💲💰💸

[x] gpt-3.5-turbo
[x] gpt-4-turbo
[x] gemini 1.5 pro
[x] claude-3-opus
[x] claude-3-haiku

ganler commented 6 months ago

It seems vLLM has a very restrictive context size limit for Llama-3 (trained on 8k max) that anything beyond 8k is rejected. DS series are fine and its ctx size can be extended as is shown in the CodeQwen report.

be563098fd2701ce06049a6c05c01d0

ganler commented 6 months ago

*CodeQwen TypeScript results are missing. will catch that up soon.

ganler commented 6 months ago

CodeQwen results updated.

ganler commented 6 months ago

Running databricks/dbrx-instruct as well.

ganler commented 6 months ago

databricks/dbrx-instruct produces empty output all the time. I think I will skip it then.

ganler commented 6 months ago

Added bigcode/starcoder2-instruct-15b-v0.1.

ganler commented 6 months ago

Got rate limited by Gemini Pro and Claude....

evalplus / repoqa

[Tracking] Evaluating models on base dataset using 16k context #25

OSS model 🤗

CodeLlama

DeepSeekCoder

Llama 3

CodeQwen

Qwen1.5

CodeGemma

Mistral

Starcoder2

Private model 💲💰💸