Open future-xy opened 9 months ago
I got an error when trying to use MoE for google/switch-base-16
:
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 408, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
return await self.app(scope, receive, send)
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/routing.py", line 758, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/routing.py", line 778, in app
await route.handle(scope, receive, send)
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/routing.py", line 299, in handle
await self.app(scope, receive, send)
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/routing.py", line 79, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/routing.py", line 74, in app
response = await func(request)
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/fastapi/routing.py", line 278, in app
raw_response = await run_endpoint_function(
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
File "/mnt/data/fy/Desktop/MoE-Infinity/moe_infinity/entrypoints/openai/api_server.py", line 226, in completion
_ = model.generate(
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/moe_infinity/entrypoints/big_modeling.py", line 161, in generate
return self.model.generate(input_ids, **kwargs)
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/transformers/generation/utils.py", line 1413, in generate
model_kwargs = self._prepare_encoder_decoder_kwargs_for_generation(
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/transformers/generation/utils.py", line 518, in _prepare_encoder_decoder_kwargs_for_generation
model_kwargs["encoder_outputs"]: ModelOutput = encoder(**encoder_kwargs)
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
result = forward_call(*args, **kwargs)
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/transformers/models/switch_transformers/modeling_switch_transformers.py", line 1043, in forward
layer_outputs = layer_module(
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
result = forward_call(*args, **kwargs)
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/transformers/models/switch_transformers/modeling_switch_transformers.py", line 767, in forward
hidden_states = self.layer[-1](hidden_states, output_router_logits)
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
result = forward_call(*args, **kwargs)
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/transformers/models/switch_transformers/modeling_switch_transformers.py", line 345, in forward
forwarded_states = self.mlp(forwarded_states)
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
result = forward_call(*args, **kwargs)
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/moe_infinity/models/switch_transformers.py", line 75, in forward
router_mask, router_probs, router_logits = self.router(hidden_states)
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
result = forward_call(*args, **kwargs)
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/transformers/models/switch_transformers/modeling_switch_transformers.py", line 208, in forward
router_probs, router_logits = self._compute_router_probabilities(hidden_states)
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/transformers/models/switch_transformers/modeling_switch_transformers.py", line 177, in _compute_router_probabilities
router_logits = self.classifier(hidden_states)
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1540, in _call_impl
args_kwargs_result = hook(self, args, kwargs) # type: ignore[misc]
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/moe_infinity/runtime/model_offload.py", line 882, in _pre_forward_module_hook
self.offload_set.remove(param.data.data_ptr())
KeyError: 889163264
@future-xy Fixed in the latest version available on TestPyPI, please feel free to give it another try.
Pull Request: Local Server Beta for OpenAI-Compatible APIs
This PR introduces a beta version of a local server that provides OpenAI-compatible APIs, specifically
v1/chat/completions
andv1/completions
. This initial version supports serving a single model and recognizes only two required fields in requests:messages
/prompt
andmodel
. Please note that other fields may not have an effect at this stage. For detailed information, refer to theREADME.md
and the./tests/
directory. It's important to mention that, in this beta version, we utilize vanilla HuggingFace Transformers models instead of the more advanced MoE-Infinity architecture.Known Limitations and Todos
finish reason
.Your feedback and contributions are welcome to help evolve this project into a more robust solution. Please refer to the
README.md
for guidelines on contributing and testing.