Closed CoreyHayward closed 9 months ago
According to the DeepSeek Github, their model is trained for code insertion. The difference between base and instruct is usually that instruct is slightly better for following instructions, and I find that to be true in cases where you put a comment describing your desired code and let the model complete the rest for you.
I used 6.7B-instruct for screenshots and now use it daily for suggestions. This is only my recommendation and in no way it's mandatory to use instruct models with the extension. It just felt slightly better than base for me and that's it.
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-base", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
input_text = """<|fim▁begin|>def quick_sort(arr):
if len(arr) <= 1:
return arr
pivot = arr[0]
left = []
right = []
<|fim▁hole|>
if arr[i] < pivot:
left.append(arr[i])
else:
right.append(arr[i])
return quick_sort(left) + [pivot] + quick_sort(right)<|fim▁end|>"""
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True)[len(input_text):])
Interesting, the code snippet copied from their GitHub is using the base model and on Hugging Face the "How to Use" section only includes chat for the instruct version but the base includes FIM. What model options are you using to get the best results if you don't mind sharing?
As I mentioned - for me it feels slightly better than the base version.
KoboldCpp is left on the default setting (except for context length): CuBLAS, mmq, ContextShift enabled
For inference I pass:
"rep_pen": 1,
"rep_pen_range": 256,
"rep_pen_slope": 1,
"temperature": 1,
"tfs": 1,
"top_a": 0,
"top_k": 100,
"top_p": 0.3,
"typical": 1,
These produce quite accurate predictions, which are reproducible with any seed.
Thanks for your help!
Your readme mentions to use the instruct model of DeepSeek-coder but isn't that model specifically not trained on FIM which you mention you are using?