FMInference / FlexLLMGen

Running large language models on a single GPU for throughput-oriented scenarios.
Apache License 2.0
9.18k stars 548 forks source link

NotImplementedError on --percent 50 50 50 50 50 50 #114

Open SeekPoint opened 1 year ago

SeekPoint commented 1 year ago

(base) ub2004@ub2004-B85M-A0:~/nndev/FlexGen_yk$ python3 -m flexgen.flex_opt --model facebook/opt-1.3b --gpu-batch-size 1 --percent 50 50 50 50 50 50

: args.model: facebook/opt-1.3b get_opt_config is: model size: 2.443 GB, cache size: 0.100 GB, hidden size (prefill): 0.002 GB init weight... Traceback (most recent call last): File "/home/ub2004/anaconda3/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/ub2004/anaconda3/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/ub2004/nndev/FlexGen_yk/flexgen/flex_opt.py", line 1328, in run_flexgen(args) File "/home/ub2004/nndev/FlexGen_yk/flexgen/flex_opt.py", line 1220, in run_flexgen model = OptLM(opt_config, env, args.path, policy) File "/home/ub2004/nndev/FlexGen_yk/flexgen/flex_opt.py", line 615, in __init__ raise NotImplementedError() NotImplementedError