Closed SingL3 closed 1 year ago
Hi! Can you share the command that causes this?
This can be addressed via adding the --no_cache
flag, but if you want to keep the caching on, then for now this error is known to arise when running multiple processes at once.
@haileyschoelkopf I have several evaluation to run, so I write a shell to do it:
python main.py \
--model hf-causal-experimental \
--model_args pretrained=/mnt/data/llm/pythia/pythia-6.9b-deduped/ \
--tasks boolq,cb,copa,multirc,record,rte,wic,wsc \
--device cuda \
--output_path ./new_outputs/baseline_69.json
python main.py \
--model hf-causal-experimental \
--model_args pretrained=/mnt/data/llm/pythia/dolly_dbfs_69 \
--tasks boolq,cb,copa,multirc,record,rte,wic,wsc \
--device cuda \
--output_path ./new_outputs/dolly_69.json
python main.py \
--model hf-causal-experimental \
--model_args pretrained=/mnt/data/llm/pythia/alpaca_dbfs_69 \
--tasks boolq,cb,copa,multirc,record,rte,wic,wsc \
--device cuda \
--output_path ./new_outputs/alpaca_69.json
python main.py \
--model hf-causal-experimental \
--model_args pretrained=/mnt/data/llm/pythia/open_instruct_dbfs_69 \
--tasks boolq,cb,copa,multirc,record,rte,wic,wsc \
--device cuda \
--output_path ./new_outputs/open_instruct_69.json
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64 python main.py \
--model hf-causal-experimental \
--model_args pretrained=/mnt/data/llm/pythia/pythia-6.9b-deduped/,peft=/mnt/data/llm/pythia/pythia-lora-dolly_69 \
--tasks boolq,cb,copa,multirc,record,rte,wic,wsc \
--device cuda \
--output_path ./new_outputs/dolly-lora_69.json
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64 python main.py \
--model hf-causal-experimental \
--model_args pretrained=/mnt/data/llm/pythia/pythia-6.9b-deduped/,peft=/mnt/data/llm/pythia/pythia-lora-alpaca_69 \
--tasks boolq,cb,copa,multirc,record,rte,wic,wsc \
--device cuda \
--output_path ./new_outputs/alpaca-lora_69.json
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64 python main.py \
--model hf-causal-experimental \
--model_args pretrained=/mnt/data/llm/pythia/pythia-6.9b-deduped/,peft=/mnt/data/llm/pythia/pythia-lora-open-instruct_69 \
--tasks boolq,cb,copa,multirc,record,rte,wic,wsc \
--device cuda \
--output_path ./new_outputs/open-instruct-lora_69.json
And this error is arose at the last command, these command should be ran in order.
That’s strange. I’ve done things like this before without issue…
The cause of this and related errors is frequently two processes trying to access the same SQLite db at the same time.
This is being worked around for our next major release, in this PR:
https://github.com/EleutherAI/lm-evaluation-harness/pull/613
Which would allow running multiple eval processes/scripts at once while caching their results, via specifying a different DB filepath in --use_cache /path/to/file/basename
for each different script being run.
I am running several evaluation. Many ones have succeeded but the last one process raise error,