Closed puppetm4st3r closed 3 weeks ago
Hi, unfortunately not. This is a hard limitation of Tensor Parallelism. The only way we could overcome this would be using Pipeline Parallelism, but that's not implemented as of now.
Thanks, Have to move into another 3090 for the server 🥲
Your current environment
for some reason, when executing the script in a fresh new server I got:
Collecting environment information... Traceback (most recent call last): File "/home/dario/work/dm/Dolf/server/env.py", line 623, in
main()
File "/home/dario/work/dm/Dolf/server/env.py", line 600, in main
output = get_pretty_env_info()
File "/home/dario/work/dm/Dolf/server/env.py", line 595, in get_pretty_env_info
return pretty_str(get_env_info())
File "/home/dario/work/dm/Dolf/server/env.py", line 404, in get_env_info
pip_version, pip_list_output = get_pip_packages(run_lambda)
File "/home/dario/work/dm/Dolf/server/env.py", line 374, in get_pip_packages
out = run_with_pip([sys.executable, '-mpip'])
File "/home/dario/work/dm/Dolf/server/env.py", line 370, in run_with_pip
return "\n".join(line for line in out.splitlines()
AttributeError: 'NoneType' object has no attribute 'splitlines'
How would you like to use Aphrodite?
I want to put a model on 3 gpu's, many models have attention heads are multiples of 2, so I constantly get this stack trace:
Is there a way to shard a model in an asymetric layer distribution in order to use 3, 5, 7 gpus ? best regards