SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

MIT License

7.95k stars 411 forks source link

how to show the outputs result on a web-service? or how can i get the result of inferrence for other application? #126

Open xujiangyu opened 9 months ago

xujiangyu commented 9 months ago

Prerequisites

Before submitting your question, please ensure the following:

[x] I am running the latest version of PowerInfer. Development is rapid, and as of now, there are no tagged versions.
[x] I have carefully read and followed the instructions in the README.md.
[ ] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).

Question Details

Please provide a clear and concise description of your question. If applicable, include steps to reproduce the issue or behaviors you've observed.

Additional Context

Please provide any additional information that may be relevant to your question, such as specific system configurations, environment details, or any other context that could be helpful in addressing your inquiry.

hodlen commented 9 months ago

Hi @xujiangyu ! If you are referring to the examples/server application, you can access it by entering the server address (e.g., 127.0.0.1:8080) in your browser. This allows you to interact with the model via a simple UI and see the outputs. For more details, please refer to the server documentation. Additionally, all inference outputs from the server are also printed to stdout.

For other applications, most of them print the inference results in the command line. You can find usage instructions in the examples/[application] directory, where each application's README and source code are available.

xujiangyu commented 9 months ago

Hi @xujiangyu ! If you are referring to the examples/server application, you can access it by entering the server address (e.g., 127.0.0.1:8080) in your browser. This allows you to interact with the model via a simple UI and see the outputs. For more details, please refer to the server documentation. Additionally, all inference outputs from the server are also printed to stdout.

For other applications, most of them print the inference results in the command line. You can find usage instructions in the examples/[application] directory, where each application's README and source code are available.

Thank you for your reply. I wonder how to add background knowledge in the parameters ,such as for RAG flow. I check the parameters of the main func and didn't recognise such a specific parameter.

hodlen commented 9 months ago

Adding background knowledge is quite an application layer concept and is no more than injecting information in prompts. This project focuses on the LLM inference and doesn't provide convenient support for that.

I suggest using some wrappers like the llama-cpp-python library (you can use our forked version here), or the server endpoint. And then you can use any mainstream orchestration frameworks like LangChain to easily achieve the RAG workflow.