LagPixelLOL / ChatGPTCLIBot

ChatGPT Bot in CLI with long term memory support using Embeddings.
MIT License
340 stars 38 forks source link

How can I pre-load a large document to generate embeddings? #2

Closed mrmachine closed 1 year ago

mrmachine commented 1 year ago

I have a 4 hour long transcript which is about 5,000 lines and 172,000 characters. It was created by Whisper API. I would like to preload this transcript to generate embeddings, and then ask ChatGPT questions and have it automatically find relevant sections of the transcript via the embeddings and include those sections in the prompt so that it can answer my questions about the transcript.

I tried to just paste a small section of the document with a prompt like:

Remember the following document. I will ask you questions about it later.

###

first line of transcript

second line of transcript

etc.

But the bot immediately responded to the first line (only) as if it was the entire prompt:

The following conversation is set to:
Me: is the prefix of the user, texts start with it are the user input
You: is the prefix of your response, texts start with it are your response
You are an AI chat bot named Sapphire
You are friendly and intelligent

Me: Remember the following document. I will ask you questions about it later.
You: I'm sorry, but I am not capable of remembering a specific document without being provided with it. Can you please provide me with the necessary information or send me the document in question?
Me:

How can I provide a multi-line prompt to the bot, to progressively load in data to then be saved as embeddings?

Or, how can I preload a document to generate embeddings before I start the chat session?

LagPixelLOL commented 1 year ago

Currently this is not supported, but i might add a Q&A mode in the near future, i can't do it now because i'm not at home and i don't have C++ dev env setup in my laptop. For now you need to make a customized automatic embedding tool and match the result with my chat history json format for it to work, or just wait until i add this function to this program.

stephenlstrange2 commented 1 year ago

Please, if I can help with this, it would be awesome for a similar project I'm working on right now.

LagPixelLOL commented 1 year ago

Please, if I can help with this, it would be awesome for a similar project I'm working on right now.

hmmm i found out theres a chatgpt plugin made by OpenAI for Q&A purposes already, now my motivation is destroyed😢 but maybe i'll still make it🙂

mrmachine commented 1 year ago

Can you preload it with a bunch of embeddings?

LagPixelLOL commented 1 year ago

Can you preload it with a bunch of embeddings?

if you are talking about the chatgpt plugin, idk because its not public currently. if you are talking about this repo, you kinda can but its not supported currently so you need to do it by yourself, probably need to modify some code too.

MikeMcMahon commented 1 year ago

is this possible yet? I definitely need this feature.

LagPixelLOL commented 1 year ago

is this possible yet? I definitely need this feature.

For the time of this comment(2023/4/19), no, it's currently not possible. I'm currently focusing on improving other things, but I will work on this asap!

Edit: Feature added in v1.2.0.

LagPixelLOL commented 1 year ago

Progress update: I'm now working on it, planning to release it in the next 3 days, but no guarantees!

LagPixelLOL commented 1 year ago

It's mostly done, releasing it in 2 hours, closed as completed.