EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
6.96k stars 1.86k forks source link

sqlite3.OperationalError: unable to open database file #538

Closed SingL3 closed 1 year ago

SingL3 commented 1 year ago

I am running several evaluation. Many ones have succeeded but the last one process raise error,

Inner exception:
  File "/mnt/data/conda/envs/lora/lib/python3.10/threading.py", line 973, in _bootstrap
    self._bootstrap_inner()

  File "/mnt/data/conda/envs/lora/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()

  File "/mnt/data/conda/envs/lora/lib/python3.10/site-packages/sqlitedict.py", line 526, in run
    inner_stack = traceback.extract_stack()

sqlite3.OperationalError: unable to open database file

Outer stack:
  File "/mnt/home/llm/lm-evaluation-harness/main.py", line 108, in <module>
    main()

  File "/mnt/home/llm/lm-evaluation-harness/main.py", line 79, in main
    results = evaluator.simple_evaluate(

  File "/mnt/home/llm/lm-evaluation-harness/lm_eval/utils.py", line 182, in _wrapper
    return fn(*args, **kwargs)

  File "/mnt/home/llm/lm-evaluation-harness/lm_eval/evaluator.py", line 72, in simple_evaluate
    lm = lm_eval.base.CachingLM(

  File "/mnt/home/llm/lm-evaluation-harness/lm_eval/base.py", line 845, in __init__
    self.dbdict = SqliteDict(cache_db, autocommit=True)

  File "/mnt/data/conda/envs/lora/lib/python3.10/site-packages/sqlitedict.py", line 229, in __init__
    self.conn.execute(MAKE_TABLE)

Exception will be re-raised at next call.
An exception occurred from a previous statement, view the logging namespace "sqlitedict" for outer stack.
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /mnt/home/llm/lm-evaluation-harness/main.py:108 in <module>                           │
│                                                                                                  │
│   105                                                                                            │
│   106                                                                                            │
│   107 if __name__ == "__main__":                                                                 │
│ ❱ 108 │   main()                                                                                 │
│   109                                                                                            │
│                                                                                                  │
│ /mnt/home/llm/lm-evaluation-harness/main.py:79 in main                                │
│                                                                                                  │
│    76 │   │   with open(args.description_dict_path, "r") as f:                                   │
│    77 │   │   │   description_dict = json.load(f)                                                │
│    78 │                                                                                          │
│ ❱  79 │   results = evaluator.simple_evaluate(                                                   │
│    80 │   │   model=args.model,                                                                  │
│    81 │   │   model_args=args.model_args,                                                        │
│    82 │   │   tasks=task_names,                                                                  │
│                                                                                                  │
│ /mnt/home/llm/lm-evaluation-harness/lm_eval/utils.py:182 in _wrapper                  │
│                                                                                                  │
│   179 │   │   │   │   "deprecated and will be disallowed in a future version of "                │
│   180 │   │   │   │   "lm-evaluation-harness!"                                                   │
│   181 │   │   │   )                                                                              │
│ ❱ 182 │   │   return fn(*args, **kwargs)                                                         │
│   183 │                                                                                          │
│   184 │   return _wrapper                                                                        │
│   185                                                                                            │
│                                                                                                  │
│ /mnt/home/llm/lm-evaluation-harness/lm_eval/evaluator.py:72 in simple_evaluate        │
│                                                                                                  │
│    69 │   │   lm = model                                                                         │
│    70 │                                                                                          │
│    71 │   if not no_cache:                                                                       │
│ ❱  72 │   │   lm = lm_eval.base.CachingLM(                                                       │
│    73 │   │   │   lm,                                                                            │
│    74 │   │   │   "lm_cache/"                                                                    │
│    75 │   │   │   + model                                                                        │
│                                                                                                  │
│ /mnt/home/llm/lm-evaluation-harness/lm_eval/base.py:845 in __init__                   │
│                                                                                                  │
│   842 │   │   self.cache_db = cache_db                                                           │
│   843 │   │   if os.path.dirname(cache_db):                                                      │
│   844 │   │   │   os.makedirs(os.path.dirname(cache_db), exist_ok=True)                          │
│ ❱ 845 │   │   self.dbdict = SqliteDict(cache_db, autocommit=True)                                │
│   846 │   │                                                                                      │
│   847 │   │   # add hook to lm                                                                   │
│   848 │   │   lm.set_cache_hook(self.get_cache_hook())                                           │
│                                                                                                  │
│ /mnt/data/conda/envs/lora/lib/python3.10/site-packages/sqlitedict.py:230 in __init__             │
│                                                                                                  │
│   227 │   │   else:                                                                              │
│   228 │   │   │   MAKE_TABLE = 'CREATE TABLE IF NOT EXISTS "%s" (key TEXT PRIMARY KEY, value B   │
│   229 │   │   │   self.conn.execute(MAKE_TABLE)                                                  │
│ ❱ 230 │   │   │   self.conn.commit()                                                             │
│   231 │   │   if flag == 'w':                                                                    │
│   232 │   │   │   self.clear()                                                                   │
│   233                                                                                            │
│                                                                                                  │
│ /mnt/data/conda/envs/lora/lib/python3.10/site-packages/sqlitedict.py:672 in commit               │
│                                                                                                  │
│   669 │   │   │   # blocking=False.  This ensures any available exceptions for any               │
│   670 │   │   │   # previous statement are thrown before returning, and that the                 │
│   671 │   │   │   # data has actually persisted to disk!                                         │
│ ❱ 672 │   │   │   self.select_one(_REQUEST_COMMIT)                                               │
│   673 │   │   else:                                                                              │
│   674 │   │   │   # otherwise, we fire and forget as usual.                                      │
│   675 │   │   │   self.execute(_REQUEST_COMMIT)                                                  │
│                                                                                                  │
│ /mnt/data/conda/envs/lora/lib/python3.10/site-packages/sqlitedict.py:662 in select_one           │
│                                                                                                  │
│   659 │   def select_one(self, req, arg=None):                                                   │
│   660 │   │   """Return only the first row of the SELECT, or None if there are no matching row   │
│   661 │   │   try:                                                                               │
│ ❱ 662 │   │   │   return next(iter(self.select(req, arg)))                                       │
│   663 │   │   except StopIteration:                                                              │
│   664 │   │   │   return None                                                                    │
│   665                                                                                            │
│                                                                                                  │
│ /mnt/data/conda/envs/lora/lib/python3.10/site-packages/sqlitedict.py:654 in select               │
│                                                                                                  │
│   651 │   │   self.execute(req, arg, res)                                                        │
│   652 │   │   while True:                                                                        │
│   653 │   │   │   rec = res.get()                                                                │
│ ❱ 654 │   │   │   self.check_raise_error()                                                       │
│   655 │   │   │   if rec == _RESPONSE_NO_MORE:                                                   │
│   656 │   │   │   │   break                                                                      │
│   657 │   │   │   yield rec                                                                      │
│                                                                                                  │
│ /mnt/data/conda/envs/lora/lib/python3.10/site-packages/sqlitedict.py:606 in check_raise_error    │
│                                                                                                  │
│   603 │   │   │   │   # as `pdb', or simply evaluating the naturally raised traceback, we        │
│   604 │   │   │   │   # retain the original (inner) location of where the exception              │
│   605 │   │   │   │   # occurred.                                                                │
│ ❱ 606 │   │   │   │   reraise(e_type, e_value, e_tb)                                             │
│   607 │                                                                                          │
│   608 │   def execute(self, req, arg=None, res=None):                                            │
│   609 │   │   """                                                                                │
│                                                                                                  │
│ /mnt/data/conda/envs/lora/lib/python3.10/site-packages/sqlitedict.py:47 in reraise               │
│                                                                                                  │
│    44 │   │   value = tp()                                                                       │
│    45 │   if value.__traceback__ is not tb:                                                      │
│    46 │   │   raise value.with_traceback(tb)                                                     │
│ ❱  47 │   raise value                                                                            │
│    48                                                                                            │
│    49                                                                                            │
│    50 try:                                                                                       │
│                                                                                                  │
│ /mnt/data/conda/envs/lora/lib/python3.10/site-packages/sqlitedict.py:521 in run                  │
│                                                                                                  │
│   518 │   │   │   │   _put(res_ref, _RESPONSE_NO_MORE)                                           │
│   519 │   │   │   else:                                                                          │
│   520 │   │   │   │   try:                                                                       │
│ ❱ 521 │   │   │   │   │   cursor.execute(req, arg)                                               │
│   522 │   │   │   │   except Exception:                                                          │
│   523 │   │   │   │   │   with self._lock:                                                       │
│   524 │   │   │   │   │   │   self.exception = (e_type, e_value, e_tb) = sys.exc_info()          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
OperationalError: unable to open database file
haileyschoelkopf commented 1 year ago

Hi! Can you share the command that causes this?

This can be addressed via adding the --no_cache flag, but if you want to keep the caching on, then for now this error is known to arise when running multiple processes at once.

SingL3 commented 1 year ago

@haileyschoelkopf I have several evaluation to run, so I write a shell to do it:

python main.py \
    --model hf-causal-experimental \
    --model_args pretrained=/mnt/data/llm/pythia/pythia-6.9b-deduped/ \
    --tasks boolq,cb,copa,multirc,record,rte,wic,wsc \
    --device cuda \
    --output_path ./new_outputs/baseline_69.json

python main.py \
    --model hf-causal-experimental \
    --model_args pretrained=/mnt/data/llm/pythia/dolly_dbfs_69 \
    --tasks boolq,cb,copa,multirc,record,rte,wic,wsc \
    --device cuda \
    --output_path ./new_outputs/dolly_69.json

python main.py \
    --model hf-causal-experimental \
    --model_args pretrained=/mnt/data/llm/pythia/alpaca_dbfs_69 \
    --tasks boolq,cb,copa,multirc,record,rte,wic,wsc \
    --device cuda \
    --output_path ./new_outputs/alpaca_69.json

python main.py \
    --model hf-causal-experimental \
    --model_args pretrained=/mnt/data/llm/pythia/open_instruct_dbfs_69 \
    --tasks boolq,cb,copa,multirc,record,rte,wic,wsc \
    --device cuda \
    --output_path ./new_outputs/open_instruct_69.json

LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64 python main.py \
    --model hf-causal-experimental \
    --model_args pretrained=/mnt/data/llm/pythia/pythia-6.9b-deduped/,peft=/mnt/data/llm/pythia/pythia-lora-dolly_69 \
    --tasks boolq,cb,copa,multirc,record,rte,wic,wsc \
    --device cuda \
    --output_path ./new_outputs/dolly-lora_69.json

LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64 python main.py \
    --model hf-causal-experimental \
    --model_args pretrained=/mnt/data/llm/pythia/pythia-6.9b-deduped/,peft=/mnt/data/llm/pythia/pythia-lora-alpaca_69 \
    --tasks boolq,cb,copa,multirc,record,rte,wic,wsc \
    --device cuda \
    --output_path ./new_outputs/alpaca-lora_69.json

LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64 python main.py \
    --model hf-causal-experimental \
    --model_args pretrained=/mnt/data/llm/pythia/pythia-6.9b-deduped/,peft=/mnt/data/llm/pythia/pythia-lora-open-instruct_69 \
    --tasks boolq,cb,copa,multirc,record,rte,wic,wsc \
    --device cuda \
    --output_path ./new_outputs/open-instruct-lora_69.json

And this error is arose at the last command, these command should be ran in order.

StellaAthena commented 1 year ago

That’s strange. I’ve done things like this before without issue…

haileyschoelkopf commented 1 year ago

The cause of this and related errors is frequently two processes trying to access the same SQLite db at the same time.

This is being worked around for our next major release, in this PR:

https://github.com/EleutherAI/lm-evaluation-harness/pull/613

Which would allow running multiple eval processes/scripts at once while caching their results, via specifying a different DB filepath in --use_cache /path/to/file/basename for each different script being run.