exoascension / vault-chat

A ChatGPT bot trained on your vault notes. Ask your AI questions about your own thoughts and ideas!
GNU General Public License v3.0
113 stars 7 forks source link

Potentially repeated embedding of old notes #13

Open wenlzhang opened 1 year ago

wenlzhang commented 1 year ago

Question 1

I updated to a newer version today, and realized that some notes may be somehow embedded again because I observed the following.

In the vault for testing Vault Chat, I only have one note, which is converted from a PDF article. The note has 6336 words in total, as indicated by Obsidian.

Yesterday, I asked several questions. From the OpenAI usage record, I see the following record has the most token usage. Other records consumed fewer than 500 tokens. I assume that this most usage is related to embedding the entire note.

ext-embedding-ada-002-v2, 4 requests
5,685 prompt + 0 completion = 5,685 tokens

Today, after updating the plugin, I also asked a few questions. I notice that the following record has the similar token usage as the previous record. Therefore, I was wondering if this is caused by the fact that the note is embedded again.

text-embedding-ada-002-v2, 2 requests
5,682 prompt + 0 completion = 5,682 tokens

Question 2

A related question is that when moving a note to a different folder within Obsidian, would the same note experience an embedding again? Or some kind of information would be cached/saved to avoid this?

I have doubts about this because in the file database2.json, it includes the full path of the note. Therefore, I assume this is either to avoid this from happening or may cause this.

kristenbrann commented 1 year ago

Will look into Question 1.

On Question 2, the way it is set up currently, it does run embedding on the note again if you move it or rename it. The thought process here was that the path your note is in could be contextually relevant. For example:

Very much open to suggestions and insights here, but that was the reasoning we used originally!

wenlzhang commented 1 year ago

The thought process here was that the path your note is in could be contextually relevant.

I understand the reasoning here.

On the other hand, I think the contextually relevant aspect may also depend on the setting and usage of the vault structure. For example, I may have a note in the Inbox folder. At a later state, I may move it to another folder. However, this change of folder does not cause any change in contextual meaning. Therefore, it is not necessary to embed the note again.

To address this, maybe there can be the following measures:

I guess this is also related to another issue, i.e., whether to re-embed the note if the content is updated. This can be especially important for large notes in my case. To address this, there may be the following measures:

Of course, these two options can be combined in some way.