SillyTavern / SillyTavern

LLM Frontend for Power Users.
https://sillytavern.app
GNU Affero General Public License v3.0
7.73k stars 2.19k forks source link

Technologicat's TODO list #1739

Closed Technologicat closed 1 month ago

Technologicat commented 8 months ago

I have way too many TODO notes stored in various places, especially given that I'm a ~drive-by contributor~ guest developer? here, so it's better to manage them centrally in one place.

The list below is mostly what I'm personally interested in at the moment, either for work or hobby reasons. Suggestions are welcome, but I won't make promises which ones I'll accept, if any - quite simply I already have my plate full at least for a while.


@Cohee1207: Is it possible to assign this issue to me? I'll keep it updated, and close when no longer relevant.

Technologicat commented 8 months ago

EDIT: Main post updated. Thanks Cohee.


Could a seasoned user or dev chime in on these points that I couldn't find any documentation on? This is something that could be included in the user manual.

Bonus unrelated question:

Cohee1207 commented 8 months ago

Checkpoints are a named branch. A branch is created using one click only with an auto-generated name. Both do the same - clone the chat file at the designated point, hence they are manageable at the chat management interface. Bookmarks are the older name for checkpoints, Timelines (I call it that in plural) weren't updated to use a name. The wand menu is used for extensions, even if they are built-in (but still could be disabled). The burger menu is for core features, mostly chat or text generation related.

Technologicat commented 8 months ago

Thanks! This helps a lot. I updated the section on tooltips in my TODO list.

To avoid confusion, might be good to change the terminology in the Timelines extension. I suppose it needs no other updates right now than just changing a few "bookmark" to say "checkpoint" instead. And changing the title of the settings panel to "Timelines", in plural. And mentioning the /tl (open timeline view) and /tl r (refresh timeline graph) slash commands in the README - discovered those too by skimming the source code.

Particularly, /tl makes for a nice Quick Reply button, to open the timeline view with a single click.

One remaining thing, I still don't understand why the details panel for a chat message in Timelines has the '→' button, since you can just open the message normally, and swipe / continue chatting to achieve the same effect.

~I'll see if I find the time to submit a PR for Timelines, too. :)~ EDIT: Sent, https://github.com/SillyTavern/SillyTavern-Timelines/pull/15. EDIT: As of 1 February 2024, merged.

Technologicat commented 8 months ago

EDIT: Turned out to be a red herring.


Wait, what?

SillyTavern/public/scripts/extensions/third-party/SillyTavern-Timelines/tl_node_data.js has this check:

        if (message.is_system && message.mes.includes('Bookmark created! Click here to open the bookmark chat')) return true;

But in SillyTavern/public/script.js (e627e897), the actual message format is:

            mes: 'Checkpoint created! Click here to open the checkpoint chat: <a class="bookmark_link" file_name="{0}" href="javascript:void(null);">{1}</a>',

so the check can't trigger.

I'll fix this while at it.

Cohee1207 commented 8 months ago

I'll fix this while at it.

This system message has been unused for long, even before a rename. There should be no chats where this is contained.

Technologicat commented 8 months ago

For those following along at home: fix reverted. The terminology fixes have been merged.

Technologicat commented 8 months ago

Regarding https://github.com/SillyTavern/SillyTavern/issues/1671, now there's a draft for a skeleton of RAG documentation in the first post.

Comments welcome, as well as where to put that and in what format. I suppose the user manual would eventually be the place, and Markdown the format?

Technologicat commented 8 months ago

Should we add a mention of the new embeddings Extras module as an optional module under Vector Storage in the Extensions ⊳ Manage Extensions view?

(This UI is so large that I keep missing things.)

Technologicat commented 8 months ago

One more thing I don't understand at the moment: if I decrease the max context length in AI Response Configuration, from 8192 to 4096 [and no other changes anywhere], then Vector Storage silently breaks - no more RAG injections. In an initially empty chat, ST then also cuts off the prompt before the user message (that had the file attachment).

Then ST either hangs (nothing is generated), or the LLM produces a random reply (unrelated topic, and sometimes writing a question on the user's behalf). The latter is easy to understand: my character description instructs the LLM to roleplay an assistant, but it didn't get a user message to respond to.

Increasing max context back to 8192, sure enough, Vector Storage works again.

Nothing useful in the web browser console. Likewise in the ST server console.

Some function could be failing silently and prematurely terminating prompt generation?

Have to debug this, but there's a lot of code to look at.

I don't yet know which parts are relevant, except processFiles in SillyTavern/public/scripts/extensions/vectors/index.js could tell me if it actually got file chunks, and Generate in SillyTavern/public/script.js might be a good place to look in general.

I'd like to get this working also at smaller context sizes (while acknowledging the limitations), because 8192 is kind of pushing at the limits of an 8 GB GPU.

EDIT: Some clarifications.


Tracing the code...

EDIT: Some investigation.

Cohee1207 commented 8 months ago

I think the inserted chunks just overflow your context and it couldn't proceed with the generation.

Technologicat commented 8 months ago

That is my hunch, too, but why doesn't it build a complete prompt, or produce any kind of error or warning message anywhere?

I'd understand if the RAG chunks were missing if they couldn't fit into the context length - though I'd like an explicit warning when this happens.

But the latest user message - a simple "How would you summarize the episodic memory theory of consciousness?" (to use Budson et al., 2022 for testing) - is missing too. It's like the latest turn in the conversation never started.

(Related to #1741, this particular PDF, with that particular question, is also prone to returning two identical copies of the abstract as the RAG chunks.)

I'll look into this a bit and see if we could at least add an explicit warning somewhere (maybe toastr or at least console.log).

Technologicat commented 8 months ago

EDIT: This comment is outdated.


Also, looking at Timelines:

EDIT: PR submitted: https://github.com/SillyTavern/SillyTavern-Timelines/pull/16. Didn't touch Lock nodes for now, but added and improved tooltips, and added a Refresh Graph GUI button (same effect as /tl r). EDIT: As of 1 February 2024, https://github.com/SillyTavern/SillyTavern-Timelines/pull/16 has been merged.

valden80 commented 8 months ago

@Technologicat I have idea/question about RAG feature: what if to mix it with WI? To work like this:

  1. File can be attached to WI entry, and ingested to vector storage.
  2. When this WI triggered - it will be processed as normal, but after the normal WI entry text it will also attach request from vector storage based on context of attached file.

This give flexibility of WI with additional power from RAG. WI entry text in this case will work as the part of instruction - to tell AI what is in the storage, and what to do with information from it. (Example - WI entry text: "This is description of the old forest:", and attached file may contain description of the forest and what can be found in it, flora, fauna, places, etc..) Do you think it's possible to add?

Technologicat commented 8 months ago

@Cohee1207: What is your opinion about the suggestion by @valden80 above?

At first glance:

As for the query, I think it would be best to use the same query as for all other RAG sources (i.e. the latest chat messages), because the dynamically changing query is where the power of RAG comes from (i.e. it always looks for things relevant to the current context).

Cohee1207 commented 8 months ago

I like it when new features are implemented with minimal possible effort, but this seems a bit more elaborate, with a lot of moving parts going on behind the scenes. However, you could add a new metadata field to store the source of the RAGed part and use the same chat collection both for messages and all the connected lore (global, character, chat). Attaching files to WI is just an extra step - files are either way converted into text representation first, so you could theoretically add the ability to attach files by converting them to entry textual content first while maintaining perfect backward compatibility with older versions and other frontends. However, I don't know if that would be actually helpful as we're still far away from the land of the unicorns where you could just drop a wiki link and LLM would understand everything, but you're free to try it anyway.

Technologicat commented 8 months ago

Thanks for the input!

Yes, that is why I asked - sounds like this could require nontrivial changes. This is also not aligned with my current priorities, but on the other hand, sounds like it would have exciting potential for expanding the capabilities of the world info system.

It's also a logical extension of existing functionality. AFAIK, world info entries are similar to RAG, but using exact keyword search. This observation suggests two alternative ways to expand WI by RAG:

Stashing the info into the chat collection sounds a promising approach. But I'll have to take a closer look at the chat collection. So far I have no idea how anything except the actual messages are treated. I suppose I could start at tracing what Generate does.

(To avoid confusing Timelines, it might be best to at least refrain from injecting "messages" that are not actually messages.)

Yeah, tools like RAG aren't perfect. Good search terms make a huge difference, whether it's a keyword or semantic search.

So... I suppose I won't be doing this now, but this could be interesting later. As usual, no promises.

github-actions[bot] commented 1 month ago

This issue has gone 6 months without an update. To keep the ticket open, please indicate that it is still relevant in a comment below. Otherwise it will be closed in 7 days.

github-actions[bot] commented 1 month ago

This issue was automatically closed because it has been stalled for over 6 months with no activity.