Closed johndpope closed 7 months ago
sorry, I have 20k in my github inbox and didn't see your message
I have since forked the old version of guidance and updated the OpenAI API code
https://github.com/fullstackwebdev/handlebars-guidance
Unfortunately in the new guidance, I can not figure out how to do streaming. It's been a week and it's a holiday week so I'll give them more time.
oh whoopts, the updated guidance code is in the 'master' branch on my repo
they fixed code upstream https://github.com/guidance-ai/guidance/issues/328
I found your repo digging through github https://github.com/microsoft/guidance/issues/328
I'll have a play with this repo soon - but thought I'd share the above localllm with guidance was great repo to showcase things.
It seems to need some of the code your working on.
UPDATE
I started playing with your code - I like the streaming code. I'm really wanting to plug in llama2 model to use with guidance.
I attempt to use this - but no joy. it's looking for safetensors.
I also have TheBloke_Llama-2-13B-chat-GGML (116gb) - in ./models/ but that doesn't work either.
looking at directory structure - can I use pytorch_model-00002-of-00002.bin instead of .safetensors I wonder.
UPDATE I hacked some code to force guidance to load specific model. but it blows up. I guess I need a wrapper around this.