EleutherAI / elk

Keeping language models honest by directly eliciting knowledge encoded in their activations.
MIT License
178 stars 33 forks source link

Generalizable multi gpu to run e.g. Llama 65b #238

Open thejaminator opened 1 year ago

thejaminator commented 1 year ago

Try it out with e.g. 2 gpus.

elk elicit huggyllama/llama-65b imdb --num_gpus 2 --gpus_per_model 2 --int8 true

If you want to say "only use gpus that have 30gb available, you can pass min_gpu_mem as per normal

elk elicit huggyllama/llama-65b imdb --num_gpus 2 --gpus_per_model 2 --int8 true  --min_gpu_mem  32212254720  

on the cluster you may get this message. (i don't have perms to delete the lock file)

you can still try it out by passing other max examples params to bypass the cache

elk elicit huggyllama/llama-65b imdb --num_gpus 2 --gpus_per_model 2 --int8 true --max_examples 100 100
Traceback (most recent call last):
  File "/home/james/.conda/envs/elk/bin/elk", line 8, in <module>
    sys.exit(run())
  File "/mnt/ssd-2/spar/james/elk/elk/__main__.py", line 27, in run
    run.execute()
  File "/mnt/ssd-2/spar/james/elk/elk/__main__.py", line 19, in execute
    return self.command.execute()
  File "/mnt/ssd-2/spar/james/elk/elk/run.py", line 59, in execute
    self.datasets = [
  File "/mnt/ssd-2/spar/james/elk/elk/run.py", line 60, in <listcomp>
    extract(
  File "/mnt/ssd-2/spar/james/elk/elk/extraction/extraction.py", line 479, in extract
    builder.download_and_prepare(
  File "/home/james/.conda/envs/elk/lib/python3.10/site-packages/datasets/builder.py", line 811, in download_and_prepare
    with FileLock(lock_path) if is_local else contextlib.nullcontext():
  File "/home/james/.conda/envs/elk/lib/python3.10/site-packages/datasets/utils/filelock.py", line 320, in __enter__
    self.acquire()
  File "/home/james/.conda/envs/elk/lib/python3.10/site-packages/datasets/utils/filelock.py", line 270, in acquire
    self._acquire()
  File "/home/james/.conda/envs/elk/lib/python3.10/site-packages/datasets/utils/filelock.py", line 404, in _acquire
    fd = os.open(self._lock_file, open_mode)
PermissionError: [Errno 13] Permission denied: '/mnt/ssd-2/hf_cache/generator/default-2e014cbd8695f82d/0.0.0_builder.lock'