Closed SinanAkkoyun closed 9 months ago
import os
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from tqdm import tqdm
import torch.nn.functional as F
import json
#CUDA_VISIBLE_DEVICES=0
from transformers import LlamaConfig,LlamaForCausalLM,LlamaTokenizer
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-base/", trust_remote_code=True, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-base/", trust_remote_code=True, padding_side='right')
prompt = """<|fim▁begin|>#!/usr/bin/env python3
# get openai completion API in python
<|fim▁hole|>
<|fim▁end|>"""
ids = tokenizer(prompt, return_tensors='pt').input_ids
out = model.generate(ids, max_new_tokens=100, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1)
print(tokenizer.decode(out[0][len(ids[0]):], skip_special_tokens=True))
I use this code to run the fim mode, and it outputs
import openai
openai.api_key = "sk-..."
response = openai.Completion.create(
model="text-davinci-003",
prompt="Say this is a test",
temperature=0,
max_tokens=7,
top_p=1.0,
frequency_penalty=0.0,
presence_penalty
Does your library transformers have the correct version?
Thank you for testing! I really don't know why, but when I try it out now (I was and am using ExllamaV2 with GPTQ), the 6.7B model gets the prediction right... I believe now that I copied your top_p it works wonders
May I ask what sampling parameters work best for the model? I have seen that having a rep. penalty of 1.15 for example strongly degrades output quality for whatever reason It would be awesome to know the best settings :)
Oh and also I have seen the 33B model produce
<|end▁of▁sentence|> and <|end▁of▁sentence|>, should any of these be used in a specific way for instruction and or code completion?
actually, you can just use do_sample=False, repetition_penalty=1. Here top_p is not worked.
for code completion, you can use <|end▁of▁sentence|> as eos, for instruction model, you can use <|EOT|>
Oh, greedy sampling is recommended for the model? Regarding the <|begin▁of▁sentence|> (sorry for the typo before), should I use that at every start of conversation or at every new instruction/response turn?
No! You just need add the bos token before the first turn
Oh so
"System prompt"
bos
### Instruction:
...
### Response:
...
etc
?
And also no sampling in recommended, even for 33B?
Thank you so much :)
bos"System prompt"
...
... etc yeah, you can use greedy search for models.
Ok last two questions, is the bos recommended for the base models for FIM? And did the instruct model also get FIM finetuned tasks?
Thank you very much!
Thanks a lot!
You are welcome!
- you need to add the bos token for the base models.
- No. We do not use FIM finetuned tasks. However, the instruction model still can work for FIM.
is not use FIM fintuned, how instrction model get the FIM ability?
- you need to add the bos token for the base models.
- No. We do not use FIM finetuned tasks. However, the instruction model still can work for FIM.
is not use FIM fintuned, how instrction model get the FIM ability?
you can directly use this code import os import torch from transformers import AutoTokenizer, AutoModelForCausalLM from tqdm import tqdm import torch.nn.functional as F import json
from transformers import LlamaConfig,LlamaForCausalLM,LlamaTokenizer model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-base/", trust_remote_code=True, device_map="auto") tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-base/", trust_remote_code=True, padding_side='right') prompt = """<|fim▁begin|>#!/usr/bin/env python3 <|fim▁hole|> <|fim▁end|>""" ids = tokenizer(prompt, return_tensors='pt').input_ids out = model.generate(ids, max_new_tokens=100, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1) print(tokenizer.decode(out[0][len(ids[0]):], skip_special_tokens=True))
Fill in the middle is performing very badly for me
prompt:
Response:
However, without FIM tokens:
This is just an example, it doesn't work for large code either, neither for the 1.3B, 6.7B nor 33B I also already checked, the special tokens get encoded right Am I doing something wrong?