go-skynet / go-llama.cpp

LLama.cpp golang bindings
MIT License
615 stars 78 forks source link

Predict callback isnt fired & output is ######### #325

Closed RobinHeitz closed 3 months ago

RobinHeitz commented 3 months ago

Hi, relative new to go-llama. I tried to replicate the example code but instead of console input, the prompt is coming from a post request.

I'm using Llama2 13b chat, I've used the convert.py file to convert the the 2 consolidated.0x.pth into a nice .gguf (f16) model. Now the problem while using the predict method:

It only returns #. Also the callback function for the newly generated tokens returns #. Do I need to do some convertions? As far as I can see in ./example/main.go file, there is nothing needed.

func (l *LLamaModel)MakePredict(prompt string) string {
    prediction, err := l.Predict(prompt,
        llama.SetTokenCallback(func(token string) bool {
            fmt.Println(token)
            return true
        }),
        // llama.Debug,
        llama.SetTokens(tokens), 
        llama.SetThreads(threads),
        llama.SetTopP(0.86),
        llama.SetStopWords("llama"),
        llama.SetSeed(seed),
        )

    if err != nil {
        log.Fatal("Fatal err in prediction: " + err.Error())
    }

    embeds, err := l.Embeddings(prompt)
    if err != nil {
        log.Println("Err in Embeddings: " + err.Error())
    }
    log.Println("Embeddings :", embeds)
    log.Println("\n\n")
    log.Print("Prediction:" + prediction)
    return prediction
}

And here the instantiation of the model:

type LLamaModel struct {
    *llama.LLama
}

func New(modelPath string) LLamaModel {
    l, err := llama.New(modelPath, llama.EnableF16Memory, llama.SetContext(context), 
        llama.EnableEmbeddings, llama.SetGPULayers(gpulayers))
    if err != nil {
        log.Println("Loading the model failed." + err.Error())
    }

    return LLamaModel{
        LLama: l,
    }
}

Thinks its trivial. Thanks

RobinHeitz commented 3 months ago

Got the answer.

Something happened to the model. I created a new one and now it outputs words.