Evaluating a Model with a Local Dataset in an Offline Environment

bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.

Apache License 2.0

825 stars 219 forks source link

Evaluating a Model with a Local Dataset in an Offline Environment #271

Open ankush13r opened 2 months ago

ankush13r commented 2 months ago

Hello, Is there currently a way to evaluate a model using a dataset from a local path, instead of fetching it directly from HuggingFace? We're working in a cluster environment without internet access, and we need to evaluate the model locally.

If this feature isn't available yet, it would be a great enhancement to consider. Implementing a solution that accepts a local dataset would allow evaluations to be run offline. A potential approach could involve adding a new script argument, such as --datasets-path, so the dataset can be loaded directly from the specified location.

Vipitis commented 2 months ago

theoretically, it should be possible to use HF_HUB_OFFLINE=1 and load from local cache or local path (if matching the dataset checkpoint dir). Since the base class makes use of dataset.load_dataset() here https://github.com/bigcode-project/bigcode-evaluation-harness/blob/f0b81a9d079289881bd42f509811d42fe73e58cf/bigcode_eval/base.py#L28

ankush13r commented 2 months ago

But, I couldn't find any way add the path for the dataset. As you can observe here https://github.com/search?q=repo%3Abigcode-project%2Fbigcode-evaluation-harness%20DATASET_PATH&type=code the dataset path is a constant variable defined directly in the code.

Vipitis commented 2 months ago

those are the checkpoint dirs from the huggingface hub. so clone the dataset repo to be that exact path locally and the load_dataset function will try local first.

ankush13r commented 2 months ago

Hello, thank for your response. I have tried what you said, but i hasn't worked for me. I let you an example that I had used to run the evaluation. I have also downloaded the dataset in /home/user/dataset.

export HF_DATASETS_CACHE=/home/user/dataset
export HF_HUB_OFFLINE=1

accelerate launch  main.py \
  --model  /path/to/the/model \
  --tasks mbpp \
  --max_length_generation 1500 \
  --temperature 1.2 \
  --do_sample True \
  --n_samples 100 \
  --batch_size 10 \
  --allow_code_execution \
  --save_generations

The error I'm getting is:


AttributeError: 'MBPP' object has no attribute 'dataset'
/gpfs/home/bsc/bigcode-evaluation-harness/bigcode_eval/base.py:30: UserWarning: Loading the dataset failed with Couldn't reach the Hugging Face Hub for dataset 'mbpp': Offline mode is enabled.. This task will use a locally downloaded dataset, not from the HF hub.                 This is expected behavior for the DS-1000 benchmark but not for other benchmarks!```

Vipitis commented 2 months ago

that sees to be an issue with the actual test in this case. MBPP used to have a vanity dataset name on the hub. so there is no org. so maybe it works if you have the /mbpp/ dataset folder on the same level as main.py

the error is actually misleading since it doesn't do anything afterwards. it is just a warning for the specific ds1000 benchmark and just means the dataset couldn't be loaded. It sorta surpresses the real error message that is more helpful.

ankush13r commented 2 months ago

Thanks it worked, I think it will work with all kind of tasks, having datasets in local machine. I would like to know if there is a way to change the path for these datasets, Since we need to save in other folder.

Vipitis commented 2 months ago

Maybe symlinks? But I am not too familiar with how the load_dataset() function resolves these. Perhaps there is a way to use the HF Hub Cache instead. As that can be pointed anywhere

ankush13r commented 1 month ago

Perfect, I'll figure it out. Thanks again!