PHI3 mini is currently the most powerful SLM yet, but can we relu it to make it fast so a single Xeon server can serve hundreds of concurrent users with relu implementation ?
Motivation
Please provide a detailed written description of reasons why this feature is necessary and how it is useful to PowerInfer users.
Prerequisites
Before submitting your issue, please ensure the following:
Feature Description
PHI3 mini is currently the most powerful SLM yet, but can we relu it to make it fast so a single Xeon server can serve hundreds of concurrent users with relu implementation ?
Motivation
Please provide a detailed written description of reasons why this feature is necessary and how it is useful to PowerInfer users.
Possible Implementation
Convert the Phi3 model to relu model