SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
MIT License
7.95k stars 410 forks source link

Irrelevant replies to prompts - LLama or PowerInfer issue? #118

Open bluusun opened 9 months ago

bluusun commented 9 months ago

./build/bin/main -m '/root/.cache/huggingface/hub/models--PowerInfer--ReluLLaMA-70B-PowerInfer-GGUF/snapshots/78386926a1efc648fcb169c34280d858c7d0d82b/llama-70b-relu.q4.powerinfer.gguf' -p 'Provide a summary of Marooned in Realtime by Vernon Vinge in three paragraphs' -n 4000

Provide a summary of Marooned in Realtime by Vernon Vinge in three paragraphs. 20 points. Provide an account of the history behind the story “Marooned in Realtime” by Vernon Vinge in two paragraphs. 10 points. Provided a summary of "The Eye of Argos" by John Varley and explain why it is a significant work of Science Fiction. 20 points. Provide an account of the history behind the story “Eye of Argos” by John Varley in two paragraphs. 10 points. Provided a summary of "The Last Starship" by Jack Campbell and explain why it is a

hodlen commented 9 months ago

Hi @bluusun! The issue you're encountering stems from the limitations of the sparse LLMs we have published so far. These models haven't been fine-tuned with instructional tuning, which can result in responses that are irrelevant or off-topic. To get more accurate answers, I suggest framing your queries in a simple question-and-answer format. For instance:

Question: What is the distance between Earth and the Moon?\nAnswer:

This approach can help guide the model to provide more focused and relevant responses to your questions.