Open leonardtang opened 9 months ago
Can you try again using the framework we used for evaluation: https://github.com/bigcode-project/bigcode-evaluation-harness there's an argument for adding a prefix. In your code it's not clear if you stripped the prompts or not (impacts performance), we also use more stop words
I'm having a similar problem - lots of empty generations on a straightforward prompt from HumanEval. For example, this code:
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "bigcode/starcoder"
device = "cuda" # for GPU usage or "cpu" for CPU usage
prompt = """\
def has_close_elements(numbers: List[float], threshold: float) -> bool:
\""" Check if in given list of numbers, are any two numbers closer to each other than
given threshold.
\"""
"""
tokenizer = AutoTokenizer.from_pretrained(checkpoint, use_auth_token="<auth_token>")
model = AutoModelForCausalLM.from_pretrained(checkpoint, use_auth_token="<auth_token>", device_map="cuda").to(device)
inputs = tokenizer.encode(prompt, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))
Just generates this output:
Loading checkpoint shards: 100%|██████████| 7/7 [00:32<00:00, 4.63s/it]
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
def has_close_elements(numbers: List[float], threshold: float) -> bool:
""" Check if in given list of numbers, are any two numbers closer to each other than
given threshold.
"""
def has_close_elements_v2(numbers: List[float], threshold: float) -> bool:
""" Check if in given list of numbers, are any two numbers closer to each other than
given threshold.
"""
def has_close_elements_v3(numbers: List[float], threshold: float) -> bool:
""" Check if in given list of numbers, are any two numbers closer to each other than
given threshold.
"""
Hi, this prompt is not stripped you need to remove the trailing \n
for it to work properly. I also just run the code from the harness and it reproduces the reported numbers.
Hi all, I've set up Starcoder as follows:
The stop tokens I'm using are a subset of those found in the Codex paper:
STOP_SEQS = ["\nclass", "\ndef"]
.Somehow, it looks like I'm consistently getting empty generations however -- just an EOS token. Concretely, around ~20% of my generations are empty on HumanEval.
I'm using the suggested prompt as well, i.e.
"<filename>solutions/solution_1.py\n# Here is the correct implementation of the code exercise\n"
.I'm getting around 15% on HumanEval, not 40% as stated in the paper. I'm setting
TEMP = 0.2
andNEW_TOKENS=128
. Would somebody be able to point out what might be going wrong?