Closed Vipitis closed 10 months ago
converted to draft as development of the next tasks has started on this branch. Will try to add the other tasks when they are ready. I don't plan to change task1 but might improve the implementation.
closed due to cleaning up a bunch of stuff. I will open a new draft PR in a few weeks that will hopefully provide a better implementation as well as documentation. Plan currently is to have the data and implementation finished by the end of this year.
Hey, first PR for me here:
adding ShaderEval task1, this is essentially just implementing the task as is in the EvaluationSuite: return completion. the task is very much just meant as a "proof of concept" as there are several issues with it. I do plan on introducing more tasks to this benchmark soon and also make them generally better. I do have several question too.
some differences that should not impact the results:
postprocess_generation
";"
- not the list of all tokens containing the semicolon (does theEndOfFunctionCriteria
handle this?)--do_sample False
to use greedy search (temperature can't be set to 0?)concerns I hope to address:
import fcntl
as that module does not exist on my home machine (Windows), so running tests wasn't possiblegpt2
,bigscience/bloom-560m
andVipitis/santacoder-finetuned-Shadertoys-fine
when running just 10 samples. Additionally I did a single run with 300 samples (this snippet is used throughout the paper) and got matching numbers of 0.566 Run parameters were the following:(not having the last two doesn't throw any error but still runs (even slower) and return erroneous outputs)