Closed sheepymeh closed 7 months ago
Hi, you can refer to this example: https://github.com/QwenLM/CodeQwen1.5/blob/main/examples/CodeQwen1.5-base-fim.py
Thank you very much! I was looking at the special tokens and didn't spot it as it is marked as special: false
. Is this intentional? I expected the <fim_*>
tokens to be labeled as special.
Don't care it, <fim_*>
will be treated as a separate token by the model, and special: false
is a configurable parameter that can be ignored.
I see, thank you!
我是个大模型小白,请问在fill-in-the-middle场景中, fim_prefix fim_suffix fim_middle 这些特殊标记的作用是什么?哪个是让模型生成代码?除这些以外还有哪些特殊标记?可以在哪里查看?
模型在一般情况下会自动生成输入以后的代码,用fim模式可以生成两半之间的代码,比如
<fim_prefix>def quicksort(arr):
if len(arr) <= 1:
return arr
pivot = arr[len(arr) // 2]
<fim_suffix>
middle = [x for x in arr if x == pivot]
right = [x for x in arr if x > pivot]
return quicksort(left) + middle + quicksort(right)<fim_middle>
<fim_prefix>
是前半部分, <fim_suffix>
代表后半部分, 然后在输入的结尾用 <fim_middle>
提示模型生成中间的代码。
你可以在 tokenizers.json
文件里查看其他特殊标记
Hi, I'd like to ask if CodeQwen has a token for fill-in-the-middle generation