Closed shiffman closed 5 months ago
This is ready for an initial review. It works! However, there is a lot to determine:
code-llama
for example.Would love any initial thoughts or advice on what I have so far!
@dipamsen are you able to run ollama or no b/c you are on PC?
@shiffman I can run ollama (there's a preview version for windows), but it runs very slowly...
Regarding distinguishing between code and narration, this needs to be done by prompt engineering, such that the model uses specific markers (eg. [SPEECH]
and [CODE]
) so separate them. Alternatively, responding with json could also work. Using two models probably won't be ideal as it may cause a incoherence between the speech and the code.
Regarding distinguishing between code and narration, this needs to be done by prompt engineering, such that the model uses specific markers (eg.
[SPEECH]
and[CODE]
) so separate them. Alternatively, responding with json could also work. Using two models probably won't be ideal as it may cause a incoherence between the speech and the code.
Agreed! I've had success with this kind of prompt engineering before (ShiffBot does similar kinds of things), but one thing I'm unsure how to approach is parsing the results while "streaming" the response from the model. . . . . (without streaming there is a lot of latency before a reply comes in.)
If it always starts with the type of response it's going to do (eg: [SPEECH]) you can keep a buffer that stores for example the last 10 characters said, then just check if it contains those types every character and switch what to do with the following output accordingly.
I am going to first try running the agent with with local version of llama running via https://ollama.com/.
However, an ideal solution might be to use transformers.js and run the model directly in node.js. One reason to use llama is that I know how to fine-tune the model which I made ultimately want to do.