0Xiaohei0 / LocalAIVtuber

A tool for hosting AI vtubers that runs fully locally and offline.
44 stars 4 forks source link

Support for lmstudio or ogaabogaa webui #8

Closed netixc closed 5 months ago

netixc commented 5 months ago

hello ive been trying to make this work with localhost:1234/v1 im not realy a coder so i use gpt4 to help me but thats also not enough gpt4 made this for me but i dont get the response back but in lm studio log it shows the response

from pluginInterface import LLMPluginInterface
import os
import json

class LocalLLM(LLMPluginInterface):
    context_length = 4096
    base_url = "http://localhost:1234/v1"

    def predict(self, message, history, system_prompt):
        # Construct the payload for the API request
        messages = [{"role": "system", "content": system_prompt}]
        for user, ai in history:
            messages.append({"role": "user", "content": user})
            messages.append({"role": "assistant", "content": ai})
        messages.append({"role": "user", "content": message})

        # Construct prompt based on the system's requirements
        prompt = " ".join([msg['content'] for msg in messages])

        # Prepare the API request payload
        payload = {
            "prompt": prompt,  # Now including the 'prompt' field
            "stream": True,  # Assuming your server supports streaming responses
            "temperature": 0.95  # Example parameter, adjust as necessary
        }

        # Debugging output to trace HTTP request details
        print("Sending request to:", self.base_url + "/completions")
        print("Payload:", json.dumps(payload))

        # Send the POST request to the server
        response = requests.post(f"{self.base_url}/completions", json=payload)

        # Debugging output to trace HTTP response details
        print("Received status code:", response.status_code)
        print("Response headers:", response.headers)
        print("Response body:", response.text)

        # Handle the response
        try:
            data = response.json()
            # Assuming the server returns a list of text completions
            for completion in data.get('completions', []):
                yield completion
        except json.JSONDecodeError as e:
            print("Failed to decode JSON from response:")
            print("Error:", e)
            yield "Error in generating response."
0Xiaohei0 commented 5 months ago

Here is the working plugin:

import json
import requests
from pluginInterface import LLMPluginInterface

class LocalLLM(LLMPluginInterface):
    context_length = 4096
    base_url = "http://localhost:1234/v1"

    def predict(self, message, history, system_prompt):
        # Construct the payload for the API request
        messages = [{"role": "system", "content": system_prompt}]
        for user, ai in history:
            messages.append({"role": "user", "content": user})
            messages.append({"role": "assistant", "content": ai})
        messages.append({"role": "user", "content": message})

        # Construct prompt based on the system's requirements
        prompt = " ".join([msg['content'] for msg in messages])

        # Prepare the API request payload
        payload = {
            "prompt": prompt,
            "stream": True,
            "temperature": 0.95
        }

        # Debugging output to trace HTTP request details
        print("Sending request to:", self.base_url + "/completions")
        print("Payload:", json.dumps(payload))

        # Send the POST request to the server
        response = requests.post(
            f"{self.base_url}/completions", json=payload, stream=True
        )

        # Check for errors in the response status code
        if response.status_code != 200:
            yield "Error in generating response."
            return

         # Initialize a buffer to accumulate the full response progressively
        accumulated_text = ""

                # Process the streaming response line-by-line
        for line in response.iter_lines(decode_unicode=True):
            # Process only lines that start with "data:"
            if line.startswith("data:"):
                data_str = line[5:].strip()

                # Stop processing if [DONE] is received
                if data_str == "[DONE]":
                    break

                # Try to decode each chunk of JSON data
                try:
                    data = json.loads(data_str)
                    # Extract and accumulate text from each choice
                    for choice in data.get("choices", []):
                        accumulated_text += choice["text"]
                        yield accumulated_text
                except json.JSONDecodeError as e:
                    yield "Error in generating response."

also if you have a gguf model you can just copy the local LLM plugin and change the model path.

netixc commented 5 months ago

localvtube

1st of all thank you the code as i tried it with lm studio this is the output i get. i used dolphin-2.8-mistral-7b-v02-Q8_0.gguf with lmstudio and chatml as template, any idea how i can make it so the output is only the response ?

and lastly can u give me ur paypal so i can buy u a cofee

0Xiaohei0 commented 5 months ago

image

I could not reproduce this issue. maybe try a fresh install of lm studio. https://www.paypal.me/xiaoheiOvO Thanks for the support <3

netixc commented 5 months ago

i tried another model and other template and it worked. Ty , is there much to change to run it on a macbook aswell ?

0Xiaohei0 commented 5 months ago

Haven't tested on Mac, but shouldn't need to change code, just need to set up the environment yourself.