Closed tangzhy closed 1 year ago
I believe you have to run with --use_auth_token
as well as --trust_remote_code
for models like starcoder since you need to agree to the terms to use them. I do believe it would be better for the evaluation to throw and error instead of running with these erroneous generations.
I believe you have to run with
--use_auth_token
as well as--trust_remote_code
for models like starcoder since you need to agree to the terms to use them. I do believe it would be better for the evaluation to throw and error instead of running with these erroneous generations.
I've try these, but the problem remains same. I think it may result from the parallel_generation and it's the reproduction failure which shall be addressed by the official repo team.
It turns out to be max_length issue, where instead of 512, we should choose 2048 for this task.
Many fresh guys don't know how to choose an appropriate max_length, maybe the document should provide a default length.
@tangzhy , Thank you for your inputs. The Prompts for GSM8K take up ~1500 tokens. The max_length has to be greater than that. We will update the docs to make this clear.
I try to run your code in a docker container from
ghcr.io/bigcode-project/evaluation-harness
.The exact bash command is
However, it returns the following:
where the saved generation contents are like:
Any solutions?