bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
745 stars 193 forks source link

prepend and parse prefix cli arg correctly when doing FIM #70

Closed benlipkin closed 1 year ago

benlipkin commented 1 year ago

When invoking the --prefix arg from the CLI, this was prepended to the FIM prompt. This works fine for models like incoder, which don't explicitly define FIM mode with a token at the start of the prompt, but with the bigcode models, it leads to weird behavior. I've adjusted such that the CLI prefix arg is prepended before the FIM prefix code but after the <fim_prefix> special token. This results in comparable accuracy on DS-1000 with and without --prefix being specified at the CLI.