Add an option for HumanEval to not strip the prompts
Add the Recode benchmark applied to HumanEval.
Tested with Codegen-2B-mono and Codegen-16B-mono. Results are within 3% of paper's results (table 13) with --tasks perturbed-humaneval-func_name-num_seeds_5 --max_length_generation 1024 --n_samples 1 --do_sample False --batch_size 1
Tested with Codegen-2B-mono and Codegen-16B-mono. Results are within 3% of paper's results (table 13) with
--tasks perturbed-humaneval-func_name-num_seeds_5 --max_length_generation 1024 --n_samples 1 --do_sample False --batch_size 1