FMInference / FlexLLMGen

Running large language models on a single GPU for throughput-oriented scenarios.
Apache License 2.0
9.18k stars 548 forks source link

CPU and M1/M2 GPU platform support #71

Closed xiezhq-hermann closed 1 year ago

xiezhq-hermann commented 1 year ago

Minimal modification to extend FlexGen to CPU and M1/M2 GPU platforms. Not fully tested with various offloading settings. @Ying1123 @merrymercy

HIRANO-Satoshi commented 1 year ago

I tried.

0af9051 *   main Merge branch 'xiezhq-hermann/main'
        |\  
18482fa | * xiezhq-hermann/main update CPU and m1/m2
980ca74 | *   merge latest main
        | |\  
332849a | * | enable CPU and M1/M2 platform
fea8321 * | | origin/main update version
896e1e0 * | | Update README.md
50ae8ad * | | Delete README.md
9d888e5 * | Move apps into flexgen package (#70)

Seems something wrong.

ppa-hirano:FlexGen hirano-s$ python3 -m flexgen.flex_opt --model facebook/opt-1.3b
Exception in thread Thread-1 (copy_worker_func):
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.10/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
Exception in thread Thread-2 (copy_worker_func):
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.10/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
Exception in thread Thread-3 (copy_worker_func):
model size: 2.443 GB, cache size: 0.398 GB, hidden size (prefill): 0.008 GB
init weight...
Exception in thread Thread-4 (copy_worker_func):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.10/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
  File "/opt/homebrew/Cellar/python@3.10/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.10/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    self.run()
  File "/opt/homebrew/Cellar/python@3.10/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 953, in run
    self.run()
  File "/opt/homebrew/Cellar/python@3.10/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 953, in run
    self.run()
    self._target(*self._args, **self._kwargs)
  File "/Users/hirano-s/dev/FlexGen/flexgen/pytorch_backend.py", line 917, in copy_worker_func
    self.run()
  File "/opt/homebrew/Cellar/python@3.10/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 953, in run
  File "/opt/homebrew/Cellar/python@3.10/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/hirano-s/dev/FlexGen/flexgen/pytorch_backend.py", line 917, in copy_worker_func
    self._target(*self._args, **self._kwargs)
  File "/Users/hirano-s/dev/FlexGen/flexgen/pytorch_backend.py", line 917, in copy_worker_func
    return _run_code(code, main_globals, None,
  File "/opt/homebrew/Cellar/python@3.10/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
    self._target(*self._args, **self._kwargs)
    torch.cuda.set_device(device_id)
  File "/opt/homebrew/lib/python3.10/site-packages/torch/cuda/__init__.py", line 326, in set_device
    torch.cuda.set_device(device_id)
  File "/opt/homebrew/lib/python3.10/site-packages/torch/cuda/__init__.py", line 326, in set_device
  File "/Users/hirano-s/dev/FlexGen/flexgen/pytorch_backend.py", line 917, in copy_worker_func
    torch.cuda.set_device(device_id)
  File "/opt/homebrew/lib/python3.10/site-packages/torch/cuda/__init__.py", line 326, in set_device
    exec(code, run_globals)
    torch._C._cuda_setDevice(device)
    torch.cuda.set_device(device_id)
  File "/opt/homebrew/lib/python3.10/site-packages/torch/cuda/__init__.py", line 326, in set_device
AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'
    torch._C._cuda_setDevice(device)
  File "/Users/hirano-s/dev/FlexGen/flexgen/flex_opt.py", line 1334, in <module>
AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'
    torch._C._cuda_setDevice(device)
    torch._C._cuda_setDevice(device)
AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'
AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'
    run_flexgen(args)
  File "/Users/hirano-s/dev/FlexGen/flexgen/flex_opt.py", line 1218, in run_flexgen
    model = OptLM(opt_config, env, args.path, policy)
  File "/Users/hirano-s/dev/FlexGen/flexgen/flex_opt.py", line 617, in __init__
    self.load_weight_stream = torch.cuda.Stream()
  File "/opt/homebrew/lib/python3.10/site-packages/torch/cuda/streams.py", line 34, in __new__
    return super(Stream, cls).__new__(cls, priority=priority, **kwargs)
TypeError: object.__new__() takes exactly one argument (the type to instantiate)
Exception ignored in: <function OptLM.__del__ at 0x11b250040>
Traceback (most recent call last):
  File "/Users/hirano-s/dev/FlexGen/flexgen/flex_opt.py", line 1148, in __del__
    self.delete_all_weights()
  File "/Users/hirano-s/dev/FlexGen/flexgen/flex_opt.py", line 803, in delete_all_weights
    self.delete_weight(j, 0)
  File "/Users/hirano-s/dev/FlexGen/flexgen/flex_opt.py", line 669, in delete_weight
    for x in self.weight_home[j].pop():
AttributeError: 'OptLM' object has no attribute 'weight_home'
ppa-hirano:FlexGen hirano-s$ 
xiezhq-hermann commented 1 year ago

@HIRANO-Satoshi Did you just run the code on your Mac machine? If so, you should add --platform "mps:0" into the command. If you tested it on a machine with NVIDIA GPU, can you try the latest commit (I merged it for you) and rebuild FlexGen? I am not sure what the codes you just ran are.

HIRANO-Satoshi commented 1 year ago

A quick work around.

    def delete_weight(self, j, k):
        if k == 0 and getattr(self, 'weight_home', None):
            for x in self.weight_home[j].pop():

But another one.

  File "/Users/hirano-s/dev/FlexGen/flexgen/pytorch_backend.py", line 917, in copy_worker_func
    torch.cuda.set_device(device_id)
  File "/opt/homebrew/lib/python3.10/site-packages/torch/cuda/__init__.py", line 326, in set_device
    run_flexgen(args)
  File "/Users/hirano-s/dev/FlexGen/flexgen/flex_opt.py", line 1218, in run_flexgen
    torch._C._cuda_setDevice(device)
    model = OptLM(opt_config, env, args.path, policy)
  File "/Users/hirano-s/dev/FlexGen/flexgen/flex_opt.py", line 617, in __init__
    torch.cuda.set_device(device_id)
    torch._C._cuda_setDevice(device)
    torch._C._cuda_setDevice(device)
AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'
  File "/opt/homebrew/lib/python3.10/site-packages/torch/cuda/__init__.py", line 326, in set_device
AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'
AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'
    self.load_weight_stream = torch.cuda.Stream()
  File "/opt/homebrew/lib/python3.10/site-packages/torch/cuda/streams.py", line 34, in __new__
    torch._C._cuda_setDevice(device)
    return super(Stream, cls).__new__(cls, priority=priority, **kwargs)
TypeError: object.__new__() takes exactly one argument (the type to instantiate)
AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'
HIRANO-Satoshi commented 1 year ago

I don't have NVIDIA. With --platform cpu, it start working. Thanks much!

Maybe apps/completion.py needs the --platform option.

ppa-hirano:FlexGen hirano-s$ python3 -m flexgen.apps.completion --model facebook/opt-1.3b
...
 File "/opt/homebrew/lib/python3.10/site-packages/torch/cuda/__init__.py", line 326, in set_device
    torch._C._cuda_setDevice(device)
AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'
'''

ppa-hirano:FlexGen hirano-s$ python3 -m flexgen.apps.completion --model facebook/opt-1.3b --platform cpu usage: completion.py [-h] [--model MODEL] [--path PATH] [--offload-dir OFFLOAD_DIR] [--percent PERCENT [PERCENT ...]] [--pin-weight [PIN_WEIGHT]] [--compress-weight] [--compress-cache] completion.py: error: unrecognized arguments: --platform cpu

HIRANO-Satoshi commented 1 year ago

A proper default without an explicit option would be better.

I'm curious how Apple Neural Engine is fast.