TorchMoE / MoE-Infinity

PyTorch library for cost-effective, fast and easy serving of MoE models.
Apache License 2.0
107 stars 8 forks source link

Introduce Local Server for OpenAI-Compatible APIs (Beta) #4

Open future-xy opened 9 months ago

future-xy commented 9 months ago

Pull Request: Local Server Beta for OpenAI-Compatible APIs

This PR introduces a beta version of a local server that provides OpenAI-compatible APIs, specifically v1/chat/completions and v1/completions. This initial version supports serving a single model and recognizes only two required fields in requests: messages/prompt and model. Please note that other fields may not have an effect at this stage. For detailed information, refer to the README.md and the ./tests/ directory. It's important to mention that, in this beta version, we utilize vanilla HuggingFace Transformers models instead of the more advanced MoE-Infinity architecture.

Known Limitations and Todos

Your feedback and contributions are welcome to help evolve this project into a more robust solution. Please refer to the README.md for guidelines on contributing and testing.

future-xy commented 8 months ago

I got an error when trying to use MoE for google/switch-base-16:

ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 408, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
    return await self.app(scope, receive, send)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/routing.py", line 758, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/routing.py", line 778, in app
    await route.handle(scope, receive, send)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/routing.py", line 299, in handle
    await self.app(scope, receive, send)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/routing.py", line 79, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/routing.py", line 74, in app
    response = await func(request)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
  File "/mnt/data/fy/Desktop/MoE-Infinity/moe_infinity/entrypoints/openai/api_server.py", line 226, in completion
    _ = model.generate(
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/moe_infinity/entrypoints/big_modeling.py", line 161, in generate
    return self.model.generate(input_ids, **kwargs)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/transformers/generation/utils.py", line 1413, in generate
    model_kwargs = self._prepare_encoder_decoder_kwargs_for_generation(
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/transformers/generation/utils.py", line 518, in _prepare_encoder_decoder_kwargs_for_generation
    model_kwargs["encoder_outputs"]: ModelOutput = encoder(**encoder_kwargs)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/transformers/models/switch_transformers/modeling_switch_transformers.py", line 1043, in forward
    layer_outputs = layer_module(
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/transformers/models/switch_transformers/modeling_switch_transformers.py", line 767, in forward
    hidden_states = self.layer[-1](hidden_states, output_router_logits)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/transformers/models/switch_transformers/modeling_switch_transformers.py", line 345, in forward
    forwarded_states = self.mlp(forwarded_states)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/moe_infinity/models/switch_transformers.py", line 75, in forward
    router_mask, router_probs, router_logits = self.router(hidden_states)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/transformers/models/switch_transformers/modeling_switch_transformers.py", line 208, in forward
    router_probs, router_logits = self._compute_router_probabilities(hidden_states)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/transformers/models/switch_transformers/modeling_switch_transformers.py", line 177, in _compute_router_probabilities
    router_logits = self.classifier(hidden_states)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1540, in _call_impl
    args_kwargs_result = hook(self, args, kwargs)  # type: ignore[misc]
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/moe_infinity/runtime/model_offload.py", line 882, in _pre_forward_module_hook
    self.offload_set.remove(param.data.data_ptr())
KeyError: 889163264
lausannel commented 8 months ago

@future-xy Fixed in the latest version available on TestPyPI, please feel free to give it another try.