LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
5.09k stars 354 forks source link

The longer the chat log, the slower the program. Problem is at "Edit" feature, but all modes are affected. #492

Open WingFoxie opened 12 months ago

WingFoxie commented 12 months ago

Every time KoboldCPP "finishes an edit to the chat log" everything hangs a bit. Every 500KB of "save file size" increase, introduces another "1 second" of additional delay. Which adds up pretty quickly. Since I've been a heavy user recently, my save file increases about 1000KB each week! The delay quickly becomes unbearable.

To reproduce: Simply make a very long chat log, with save file size of a few MBs, and make the simplest edit, and click away for the edit to register. Or you just click the "Settings" button, and click "Cancel" button without changing anything at all! (But why?)

Oh and if you notice not just 1 sec delay each 500KB, but 10sec delay each 500KB, that's because you have NoScript plugin installed. (But why??) I had to create a separate browser profile, without that plugin installed, dedicated for this WebUI, to make the pain at least a bit manageable.

Also, this bug doesn't only affect edit mode. Since all modes requires KoboldCPP to "finish an edit to the chat log" sometimes, e.g. token streaming, or end sequence trimming. You can notice that that token streaming get much less frequently updated with huge chat logs. And it eventually creates a soft bottleneck, lowering the GPU utilization as well, simply because everything else prefers waiting for the edit to finish!

(Above is the second draft.)

(First draft below, in case anything in it still is helpful.) Title: Every time finish editing the text, UI hangs for more than 10 seconds, with save file as large as 1MB How to reproduce: 1. Have a very long "chat" history. For example, save file size exceeds 1MB, and over 3000 turns. 2. Enter edit mode, edit a bit. (Add an "a" in the end or something) 3. Click away to register the edit. Now you should see the UI hangs for a decent amount of time. Should be >1 second, >10 seconds if you have NoScript plugin installed. (Using Firefox browser here.) I use edit all the time, and the hanging will only get longer as the chat history gets longer. (Also I notice that clicking the "OK" button in the settings UI should also cause a very long delay at this point. You don't even need to change any settings, just click "settings" -> click "OK" -> hangs a second or two.) (Also if you are already reproducing the bug you may stop reading the rest. Those are just additional info which gets increasingly useless as it progresses. Just wanted to make sure that I provided all the might-be related info.) Workarounds: 1. Uninstall NoScript and browse the whole Internet without it, just for this app. (Scratch this) 1. Create a separate Firefox browser profile without NoScript installed, just for this app. Will reduce the hang time to a sort of acceptable level. 2. Also, just click the "submit button" whenever you want, despite it being unavailable, the click will register after the hanging is over. Can even alt-tab away before that happens, and the click will still register. Tried these but won't help: 1. Choose story mode instead of chat mode. (Doesn't seem to work, also creates its own problems) 2. Stop using that Stylus script that helps me recolor the "User:" and "AI:" strings to the colors I want. (That script doesn't slow things down) 3. Stop using Dark Reader plugin to help recolor the UI. (That doesn't slow things down either) Things I didn't try: 1. Use Chrome instead of Firefox. (Don't want to) 2. Use Edge, Opera, Brave... instead of Firefox. (Don't want to) 3. Use a good offline web browser that's specifically made for local URLs, and will definitely not connect to the Internet by itself. (Does such thing even exist?) 4. Just don't edit (No way) 5. Cut off the first 80% of the chat history, save them elsewhere, and use the rest. (Will definitely work but... I was jumping ship from "KoboldAI/KoboldAI-Client" to this "LostRuins/koboldcpp/". And with "KoboldAI/KoboldAI-Client" the editing feature worked without any delay at all. So there has to be an optimization problem somewhere) What version am I currently using? How do I check that? I can't see version number anywhere. Checks the WebUI, nothing. Right click Koboldcpp.exe -> properties, nothing. It's 1.45.2 by the way. I know this only because I installed the "latest version" 3 days ago, so it must be that version. Or check the console output, but I was running it for so long, input so much, that the very first output is already washed away by all the other outputs, even with the 9001 line history on Windows Console. And I don't want to launch it again, otherwise I'll have to click "Allow Editing" for one more time (#190). Then I realize, I can just run koboldcpp.exe without using .bat and .kcpp files to jump straight up to the WebUI, and without having to close the existing instance before that. Then I'll see the version number in the launch settings UI. Without having to launch my already running session once more, and click "Allow Editing" one more time.
LostRuins commented 10 months ago

Right now, if a sync autosave is ever called while an aysnc one is in progress, it will complete the sync save, and then set pending_storyjson_autosave to null to clear any queued save. However, if there is an already ongoing async save, it is not interrupted.

So I guess in some rare cases this can lead to the async save completing after the sync one is done, and overwriting the newest savefile with an older one.

I think I am overcomplicating the system by having both async and sync saves. I don't think it's needed. I will just ditch the sync save and do it all async. Hopefully that will be fine. Can you please review the latest version now?

aleksusklim commented 10 months ago

Yes, new autosave is working perfectly! On reasonable-sized stories it completes instantly, even while being asynchronous.

By the way, I tried your new raw-html noscript editor, and I found out that is produces \r\n instead of \n for line-breaks? I think you should replace it server-side, otherwise pasting the same text as in normal editor re-evaluates the context! (And will cause a lot of other problems, especially if \r would really get into the model)

Now, as for pasting stuff into the long history… I've tried implementing HTML-cleanup by using Mutation Observer, here is my attempt:

Replacing your addEventListener("paste") ```js var my_mo = null; var my_map = null; document.getElementById("gametext").addEventListener("paste", function(e) { if(!my_mo){ my_mo = new MutationObserver(function(mutationList,observer){ mutationList.forEach(function(mutation){ //console.log(mutation); if(!my_map){ my_map = new Map(); (new Promise(function(resolve){ resolve(); })).then(function(){ my_mo.disconnect(); my_map.forEach(function(_,elem){ console.log(elem); if(elem.parentNode){ elem.outerHTML = elem.textContent; } }); my_map.clear(); my_map = null; }); } var added = mutation.addedNodes; for(var i=0,n=added.length; i

– This is not working! Breaks the native undo: it is not doing anything after pasting.

While googling around, I found this editor: https://prosemirror.net/ Check it out!

Looks like they can somehow preserve the native undo function (but NOT the "redo" function!), I couldn't understand, how exactly this happens. The code is very modular and it is hard to trace relevant features.

I tried to paste there the whole long story and… it lagged )) What do you think, is it lagging more than yours or less?

aaronshenhao commented 9 months ago

I'm new to KoboldCpp, currently using the latest Windows version (1.54). So far, my experience so far has been very positive. However, I've encountered an issue where the webpage hangs loading large JSON files (4.8 MB). Additionally, I've noticed that the webpage seems to slow down as the chat gets longer, most noticeably when switching out of edit mode after editing older conversations. For example, trying to edit something at the top of a 300 kB file and then exiting edit mode results in a significant delay.

I puzzled by this behavior, as it shouldn't take longer than loading the save, unless it is doing something like re-splitting the entire text history based on the stop sequences. I'm not sure if the real culprit is has been discussed above, as I'm not a Javascript developer, but I know from my Python experience that it shouldn't take that long. To create the 4.8 MB save for KoboldAI, I had to manually parse (using regex split and replace) and inject ~30,000 conversations from Character.AI into a JSON template. Surprisingly, even this process was much faster than editing the 300 KB file.

KoboldCpp appears to load the entire conversation history into the webpage, which seems unnecessary for regular use. In conversational AI platforms like Character.AI, only a limited amount of past texts are loaded. Older texts are fetched only when the user scrolls to the top, similar to how YouTube and Reddit comments work.

LostRuins commented 9 months ago

KoboldCpp does this because it is a browser-only frontend with no fixed backend. So all text manipulation occurs within the browser itself and the entire story is always loaded and saved in memory.

Editing large stories is known to slow down the UI, this is mainly bottlenecked by the execCommand insertHTML function. What browser are you using? A different one might help.

aleksusklim commented 9 months ago

I don't understand why people keep using Lite as de-facto story editor, it is not the best choice for this.

The actual amount of text used is MUCH smaller than what you all are trying to feed there! What's the point, why can't you manually cut away 85% of the story?

VL4DST3R commented 9 months ago

Assuming they are like me (although personally I no longer have issues with delays on long texts since Ruins updated the autosave mechanic), because keeping a ledger of your previous stories is important, and you'd rather have them in one file, settings and all.

WingFoxie commented 9 months ago

Here's the thing. Now my settings is "Max Context Tokens"=4096. And when I trim the story to only contain 4096 tokens, the save file is already 60KB. And this app is supposed to support 64k tokens context right? 65536 tokens? That would be a 960KB save file. Yet the lag starts to be noticeable when the save file goes to 200KB, unbearable when the save is 400KB. A 960KB save will lag much more. Not only with manual edits, but also with "token streaming"!

This means, this app doesn't even count as supporting that much context size. Since if you really have the hardware to set the context size all the way to 64k, it will be unusable. You will only find that it lags from the get go, even if you just keep enough text to fill the context window! (Need someone who actually has the hardware to use 65536 Context size, to verify this!)

Of course I can't keep all the text there and keeping a 10MB save file of so... I currently keep fetching the older chatlog off to another file. (Tip: Click the bottom left"Last Request" to see what text is still in the context. And try to only move away the text before that.)

But eventually, the app should at least eliminate lags within the context. Which means to eliminate lag when the save file is less than 960KB... Way to go...

(Or at that point I get a new CPU that's 5x faster than my current one and automatically solve the issue? I don't know. I currently have a mid range CPU from year 2017.) (Even so, by the year I got the new CPU, this app probably already supports even more context size. So I still won't comfortably use the max available context size setting.)

VL4DST3R commented 9 months ago

WOW! Ok, color me impressed, and here I thought mine were too long! My longest text atm is a chat-style text (which IIRC was one of the worst offenders performance-wise before the update) that's 633kb in size, and on the latest version I can edit any part of it and it takes about 2 seconds to save edits.

Now, I have to ask though, how are you getting texts that big? The one in my example is equivalent to ~140 pages of unformatted text! And you are writing stuff that aims to be anywhere from double to 10 times as big? What kind of stories are you writing!? Do you even remember what you wrote say, 50/100 pages ago in those texts?

aleksusklim commented 9 months ago

I used models with 64k of context, but I managed to actually fill only up to 20k, because I was already tired of the story when I got up to that point.

@LostRuins, what about this:

Considerations:

Save file format does not need to be changed: the field will be populated on the first request, and after that the game will be fast and smooth since no huge reparsing or pasting would be tracked.

Note that in essence this is purely interface option dedicated to optimize browser rendering, and does not change the way Lite works with text internally. So it is always "safe" to enable it. In case the user does not like it – he can just not use it.

But for those who LIKE to mess with gigantic texts – it would become a salvation!

LostRuins commented 9 months ago

actually that is a very interesting idea! Am I right to say that the lag you all experience only happens during editing? Does it happen when you submit next text for generation too?

aaronshenhao commented 9 months ago

KoboldCpp does this because it is a browser-only frontend with no fixed backend. So all text manipulation occurs within the browser itself and the entire story is always loaded and saved in memory.

Editing large stories is known to slow down the UI, this is mainly bottlenecked by the execCommand insertHTML function. What browser are you using? A different one might help.

I'm using Firefox on both PC and Android. Is there any documentation available for the format of the JSON save files? I noticed that when I deleted the first line ('prompt' in the JSON file), the save file seemed to have combined multiple entries from the 'actions' key and placed it in the prompt.

While the current edit function is powerful, it might be beneficial to only load the most recent texts for normal use cases (as @aleksusklim suggested). If I understand correctly, each 'turn' is stored as a separate entry in an array under the 'actions' key in the JSON file. If so, then it might be beneficial just confine edits to single messages, similar to how ChatGPT and Character.AI does it. It's also much safer, as it reduces the risk of accidental deletions or edits somewhere else.

VL4DST3R commented 9 months ago

actually that is a very interesting idea! Am I right to say that the lag you all experience only happens during editing? Does it happen when you submit next text for generation too?

+1 for indeed a clever solution. Speaking only for myself and a friend but in our case it only happened during editing, generating, re-rolling or using back/redo was never affected.

it might be beneficial just confine edits to single messages, similar to how ChatGPT and Character.AI does it.

Only issue I can see there is if you need to delete or modify a large chunk of the story in a single pass, then you still need the current flexibility of working with one singular black of text so IMO the suggested approach of a classic and enhanced view would be best provided as an option to the user.

aaronshenhao commented 5 months ago

actually that is a very interesting idea! Am I right to say that the lag you all experience only happens during editing? Does it happen when you submit next text for generation too?

Now that I've used KoboldAI for quite a while, I think the lag is actually caused by the default mode. The lag occurs on long chats when:

This all indicates that updating the default mode is what's causing the problem. I hardly get any lag when generating a message in edit mode, even for large chats.

aaronshenhao commented 4 months ago

@LostRuins I think I've found the problem! Firefox's profiler (performance tab) revealed 99% of execution time was being spent on this line when loading a file, or switching out of edit mode: https://github.com/LostRuins/lite.koboldai.net/blob/3ec45673ed2dc7ad19f5fdc967e843cf8c3b03e0/index.html#L14546

File loading image

Switching out of edit mode image

If I comment out that line, the formatting is messed up (the text on the start screen, and the name formatting), but a 4 MB file that takes forever to load (it crashes the page) now loads in less than 2 seconds. Hopefully someone can find a way to speed up that line.

LostRuins commented 4 months ago

Thanks for looking into it. Could you send me a sample json file that exhibits this issue so I can debug it?

aaronshenhao commented 4 months ago

@LostRuins Here's a test file with 10K conversations, each a paragraph of Lorem ipsum: Lorem-ipsum-10K.json

The file likely would not load in Kobold Lite, unless line 14,546 is commented out.

The line is only used by the "Aesthetic" UI style under the "Chat Mode" format. Other UI styles are able to load the chat just fine. "Messenger" takes about 2 seconds on my machine, and "Classic" is almost instantaneous. Here's a save file that uses the "Messenger" UI mode: Lorem-ipsum-10K-messenger-mode.json

This may not be the root cause of the problems, but it's definitely an issue since it affects the default chat UI. And the difference is between a 0-2 second load time, and the page not loading at all. Luckily, the issue can be narrowed down to a single foreach statement, and other UI styles aren't affected.

LostRuins commented 4 months ago

Thanks. I'll see if i can fix it

LostRuins commented 4 months ago

Fixed it. Try loading your story in https://lite.koboldai.net and it should be good

aaronshenhao commented 4 months ago

Fixed it. Try loading your story in https://lite.koboldai.net and it should be good

I just tested it, and the default chat UI now loads even faster than the Messenger UI! Thanks for the great work.