SinanAkkoyun commented 10 months ago

Fill in the middle is performing very badly for me

prompt:

<｜fim▁begin｜>#!/usr/bin/env python3
# get openai completion API in python
<｜fim▁hole｜>
<｜fim▁end｜>

Response:

                                                                           <aiParent�ianBaidC                                            <                                                              <Au7AgentX                                                                      9   <pK9pKpf8+25yK6Y1ZbK+nSdS9b3vb/K1R2t9f47XaZ8NV3c2vDZf2eH3f781/7e2mznFc+e9zfEeV/e5N3v

However, without FIM tokens:

#!/usr/bin/env python3
# get openai completion

import requests

def main():
    API_KEY = "YOUR OPENAI API KEY"
    headers = {'Authorization': f'Bearer {API_KEY}'}
    data = {
        'model': 'text-davinci-002',
        'prompt': 'The current temperature in Stockholm is ...',
        'max_tokens': 9,
        'temperature': 0.1,
        'top_p': 1,
        'frequency_penalty': 0,
        'presence_penalty': 0,
        'stop': ['\n','\r']

This is just an example, it doesn't work for large code either, neither for the 1.3B, 6.7B nor 33B I also already checked, the special tokens get encoded right Am I doing something wrong?

pkuzqh commented 9 months ago

import os
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from tqdm import tqdm
import torch.nn.functional as F
import json
#CUDA_VISIBLE_DEVICES=0
from transformers import LlamaConfig,LlamaForCausalLM,LlamaTokenizer
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-base/", trust_remote_code=True, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-base/", trust_remote_code=True, padding_side='right')
prompt = """<｜fim▁begin｜>#!/usr/bin/env python3
# get openai completion API in python
<｜fim▁hole｜>
<｜fim▁end｜>"""
ids = tokenizer(prompt, return_tensors='pt').input_ids
out = model.generate(ids, max_new_tokens=100, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1)
print(tokenizer.decode(out[0][len(ids[0]):], skip_special_tokens=True))

I use this code to run the fim mode, and it outputs

import openai

openai.api_key = "sk-..."

response = openai.Completion.create(
  model="text-davinci-003",
  prompt="Say this is a test",
  temperature=0,
  max_tokens=7,
  top_p=1.0,
  frequency_penalty=0.0,
  presence_penalty

Does your library transformers have the correct version?

SinanAkkoyun commented 9 months ago

Thank you for testing! I really don't know why, but when I try it out now (I was and am using ExllamaV2 with GPTQ), the 6.7B model gets the prediction right... I believe now that I copied your top_p it works wonders

May I ask what sampling parameters work best for the model? I have seen that having a rep. penalty of 1.15 for example strongly degrades output quality for whatever reason It would be awesome to know the best settings :)

SinanAkkoyun commented 9 months ago

Oh and also I have seen the 33B model produce

<｜end▁of▁sentence｜> and <｜end▁of▁sentence｜>, should any of these be used in a specific way for instruction and or code completion?

pkuzqh commented 9 months ago

actually, you can just use do_sample=False, repetition_penalty=1. Here top_p is not worked.

pkuzqh commented 9 months ago

for code completion, you can use <｜end▁of▁sentence｜> as eos, for instruction model, you can use <|EOT|>

SinanAkkoyun commented 9 months ago

Oh, greedy sampling is recommended for the model? Regarding the <｜begin▁of▁sentence｜> (sorry for the typo before), should I use that at every start of conversation or at every new instruction/response turn?

pkuzqh commented 9 months ago

No! You just need add the bos token before the first turn

SinanAkkoyun commented 9 months ago

Oh so

"System prompt"
bos
### Instruction:
...
### Response:
...
etc

?

And also no sampling in recommended, even for 33B?

Thank you so much :)

pkuzqh commented 9 months ago

bos"System prompt"

Instruction:

...

Response:

... etc yeah, you can use greedy search for models.

SinanAkkoyun commented 9 months ago

Ok last two questions, is the bos recommended for the base models for FIM? And did the instruct model also get FIM finetuned tasks?

Thank you very much!

pkuzqh commented 9 months ago

you need to add the bos token for the base models.
No. We do not use FIM finetuned tasks. However, the instruction model still can work for FIM.

SinanAkkoyun commented 9 months ago

Thanks a lot!

pkuzqh commented 9 months ago

You are welcome!

yiyepiaoling0715 commented 7 months ago

you need to add the bos token for the base models.

No. We do not use FIM finetuned tasks. However, the instruction model still can work for FIM.

is not use FIM fintuned, how instrction model get the FIM ability?

pkuzqh commented 7 months ago

you need to add the bos token for the base models.

No. We do not use FIM finetuned tasks. However, the instruction model still can work for FIM.

is not use FIM fintuned, how instrction model get the FIM ability?

you can directly use this code import os import torch from transformers import AutoTokenizer, AutoModelForCausalLM from tqdm import tqdm import torch.nn.functional as F import json

CUDA_VISIBLE_DEVICES=0

from transformers import LlamaConfig,LlamaForCausalLM,LlamaTokenizer model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-base/", trust_remote_code=True, device_map="auto") tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-base/", trust_remote_code=True, padding_side='right') prompt = """<｜fim▁begin｜>#!/usr/bin/env python3 <｜fim▁hole｜> <｜fim▁end｜>""" ids = tokenizer(prompt, return_tensors='pt').input_ids out = model.generate(ids, max_new_tokens=100, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1) print(tokenizer.decode(out[0][len(ids[0]):], skip_special_tokens=True))

deepseek-ai / DeepSeek-Coder

FIM doesn't work #71

Instruction:

Response:

CUDA_VISIBLE_DEVICES=0