Lightning-AI / litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
https://lightning.ai
Apache License 2.0
10.59k stars 1.05k forks source link

how to change dataset path or download url when evaluating #1556

Open lzd-1230 opened 4 months ago

lzd-1230 commented 4 months ago

Hi guys, I’m having trouble downloading huggingface datasets directly from code. So I need to download it through 1. mirror 2. offline by proxy

but by using cmd like

litgpt evaluate /work/4/zd/gpu-apac/litgpt/Meta-Llama-3-8B \
        --device 4  \
        --batch_size 4 \
        --tasks "gsm8k" \  
        --out_dir eval_math/

I couldn't find a way to specify my mirror url or my local dataset which is already downloaded. Is there any methods to realize my requirement? Appreciate it!

rasbt commented 4 months ago

That's a good question. LitGPT uses the simple_evaluate function from the LM evaluation harness under the hood:

https://github.com/EleutherAI/lm-evaluation-harness/blob/058cfd0eeb022c0bc4862651a3ae08e4e046a106/lm_eval/evaluator.py#L48-L77

I currently don't see how one could override the paths for the tasks. There may be a way to do it, but I am not sure at this point.