[WIP] adding ShaderEval tasks

Vipitis commented 1 year ago

Hey, first PR for me here:

adding ShaderEval task1, this is essentially just implementing the task as is in the EvaluationSuite: return completion. the task is very much just meant as a "proof of concept" as there are several issues with it. I do plan on introducing more tasks to this benchmark soon and also make them generally better. I do have several question too.

some differences that should not impact the results:

generation returns the prompt, so we remove it again in postprocess_generation
stop word is set as ";" - not the list of all tokens containing the semicolon (does the EndOfFunctionCriteria handle this?)
generation parameters can't be set so user has to set --do_sample False to use greedy search (temperature can't be set to 0?)

concerns I hope to address:

my term paper on the project isn't yet published so I got no reference to explain the tasks
my naming convention doesn't seem to fit
I did not add any documentation yet
fix the dataset revision to 0.0.2 for this task specifically
I had to comment out all import fcntl as that module does not exist on my home machine (Windows), so running tests wasn't possible
It's really slow on my home machine (as Intel GPU is not supported in accelerate on Windows), therefore I have only been able to run a few models with really short snippets. I got matching scores for gpt2, bigscience/bloom-560m and Vipitis/santacoder-finetuned-Shadertoys-fine when running just 10 samples. Additionally I did a single run with 300 samples (this snippet is used throughout the paper) and got matching numbers of 0.566 Run parameters were the following:
```
accelerate launch main.py \
--model Vipitis/santacoder-finetuned-Shadertoys-fine \
--tasks ShaderEval \
--limit 10 \
--do_sample False \ 
--save_generations \
--save_generations_path generations_py.json \
--use_auth_token \
--trust_remote_code
```
(not having the last two doesn't throw any error but still runs (even slower) and return erroneous outputs)

Vipitis commented 1 year ago

converted to draft as development of the next tasks has started on this branch. Will try to add the other tasks when they are ready. I don't plan to change task1 but might improve the implementation.

Vipitis commented 10 months ago

closed due to cleaning up a bunch of stuff. I will open a new draft PR in a few weeks that will hopefully provide a better implementation as well as documentation. Plan currently is to have the data and implementation finished by the end of this year.

bigcode-project / bigcode-evaluation-harness

[WIP] adding ShaderEval tasks #97