ccappetta / bidirectional_streaming_ai_voice

Python scripts to handle a two way voice conversation with Anthropic Claude, using ElevenLabs, Faster-Whisper, and Pygame.
MIT License
118 stars 28 forks source link

system message is a hard-coded summary #9

Closed danx0r closed 5 months ago

danx0r commented 5 months ago

For this project to be usable by multiple users, we will need a way to personalize the system prompt, which is presently hard-coded based on several hours of discussion with Claude (Quill?)

Meta-question for ccappetta: are you interested in maintaining & improving this project with an eye towards multiple users and contributors? If so, it would make sense to discuss how issues like persistence, summaries, and customization are going to be implemented.

I'm here because I was basically coding the same thing but yours is better and farther along. I am interested in seeing where this goes.

ccappetta commented 5 months ago

I appreciate the note! I'd definitely be down to be in a sort of contributor role. With a toddler, a full time job, and a baby on the way I don't think I have the capacity to do much in the way of running a project. I'm more than happy to accept pull requests though, and would be down to give you or others the ability to merge.

Actually the version I'm running currently has already diverged a decent bit from the currently published one, and touches on a couple of the topics you raised.

The version I'm using isn't summarizing the previous conversations, instead I added a mechanism to let me pick up from a transcript and carry on that conversation (or I can still launch from scratch). So my system message is just the instructions and no summary at this point. I'll plan to share that version one I can align it with these couple pull requests already in the mix.

I think a broader sort of memory structure is very much of interest, but from a fair bit of fiddling with semantic search I don't think a simple RAG is the answer there. I'd probably look to do some sort of named entity extraction and keyword extraction to fetch relevant topics.

danx0r commented 5 months ago

I understand about work/life/hobby contention for resources! I definitely can't put unlimited energy into this but I will be fiddling with it when I get the time.

I do hope you post your changes so we don't diverge significantly. You might want to put leading-edge breaking changes in a branch ("dev" perhaps?) so contributors can see what's coming but casual users can still pull from main. Poor man's CI :|

ccappetta commented 5 months ago

just made some updates to he structure and i'm using branches for the different variations. e.g. i pulled out the linux and enter-keypress support that I believe you contributed into a specific branch named to indicate that functionality. and I also added a branch for a version where a colleague needed to run this on a less powerful PC so we swapped in a cloud transcription API in place of the local faster-whisper model. I'm thinking this will be a useful, if unorthodox(?), way to manage a bunch of different potential variations on this, and I'm optimistic it'll open me up now to try some of the other transcription and streaming models we've been looking at in other issues

ccappetta commented 5 months ago

big round of closing comments for my own organizational sanity, please shout to continue this convo! newest version in main isn't referencing summaries anymore in the instructions and instead has the option to launch a conversation history from transcript