bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
702 stars 180 forks source link

Use trust_remote_code for dataset load #176

Closed Vipitis closed 5 months ago

Vipitis commented 6 months ago

Following https://github.com/huggingface/datasets/issues/6400 and the 2.16 release of datasets you now require trust_remote_code=True for datasets with a custom builder config.

if I read the code correctly, the --trust_remote_code flag gets passed to model and tokenizer already. It does not reach the load_dataset method here.

That seems an easy fix and I can open a PR next week. Addressing it for the finetuning examples would be more edits.

loubnabnl commented 5 months ago

if I'm not mistaken, the only dataset that has arbitrary code in the harness now and could need this flag in the future is MultiPL-E?

It's a bit different from models where users can pass any model they want, whereas we have a limited number of carefully selected tasks so I'm not sure we need the flag for all tasks (we can add it just for MultiPL-E when it's a requirement if the authors don't update their loading script by then).

Vipitis commented 5 months ago

I haven't checked myself. And moving all datasets to parquet export and no builder script is the better idea anyway. Perhaps adding a notice to the new task template might be an easier "fix".

If there is just a single case it can by bypassed using the env flag much easier.

I additionally came across this with the task I am developing. But moving to parquet is on my to-do list either way. Feel free to close this issue and reject the draft PR.

loubnabnl commented 5 months ago

sounds good, I also talked to MultiPL-E authors and they will remove the python code in the dataset so it wouldn't require the flag.