Open WingFoxie opened 1 year ago
Hmm the editor wasn't really designed for stories that long. It might just be the overhead required to compress and save the story. Can you try disabling the "Autosave Session" and "Embed Settings File" and see if that helps?
For reference, I have a 500kb story in chrome, which is a few hundred turns, and saving an edit takes less than 1 second.
It's not just "a problem only with insanely long chat logs". You really need to fix this somehow. It's unacceptable that the response time gradually slows down, just because of all the text that are out of the context length, and won't be sampled whatsoever. The delay is not that difficult to notice either. It becomes noticeable after the save size gets only 200KB or so. Which is like... I can do this in half a day. For starters, if the app can automatically only keep a close eye on enough lines to fill the context window, and won't keep processing the rest of the whole chat log all the time. That would be great. Especially don't keep processing all the chat log when token streaming!
About the settings you asked me: On my currently 3.0MB save file: With all settings in question OFF, editing takes 1 sec to finish. "Embed Settings File" does nothing. "Autosave Session" adds 3 sec.
Yet, "Token Streaming" adds an insane amount of delay! +200% time on each response! An 25 tokens response, which is supposed to only take 10 seconds to generate, will take 30 seconds to generate! Since this basically inserts multiple "edits". (By 30 seconds I mean, from the moment I press Enter to "Submit", to the moment the text cursor blinks again.)
(BTW no matter how long the app hangs when editing, saving, etc. I only see a 20% CPU utilization spike. Not a single CPU core, or anything, exceeds 50% usage when it's only busy editing or saving. Only high usage when it's actually sampling and generating tokens. Utilization problem maybe?)
(End of second draft)
What version of koboldcpp are you using?
The latest version.
Let me just put it this way. If this problem never gets fixed. I have to stick to using a couple of workarounds:
Obviously I can't single-handedly fix this. All I can figure out all by myself is that: When editing a single bit on a long chat log... some functions are executed way to many times, like 10000 times.
But I don't know what to do from here. But if there's still anything else I can do to help me help you. I'll try.
what do you mean "render_gametext" function is called 7000 times.
are you saying that's triggered every time you edit? how are you getting that count?
Just press F12 to bring up the Browser's developer features, should have a tab called "performance" or something, which should have a pretty obvious button in it, which allows you to record performance and analyze. I just started recording right before I make a tiny edit to the chat log. Then I edit, and wait for 10 seconds until it finally finishes, then removes that tiny edit, and wait another 10 seconds, and stop the recording and see the captured records called a "performance profile". And then I try my best to pretend that I know how to read it. I'm not a developer at all, after all.
By contrast, I did another recording, with only a few screens of text in the chat log. And I record editing the text once with the UI feeling snappy and fast. Those "methods", or whatever they are, only got called for about 60~80 times, each. The "render_gametext" was called 63 times, which clearly indicates a problem not yet causing enough total lag to be noticed.
If you want to try it, I believe chrome also has a similar feature, which should also be under the development UI, under a tab about "performance", and also has a pretty obvious button there. Just, make sure that you choose the right preset for recording. Make sure that you are actually recording the performance of "the web page", not "the Browser itself" or anything else!
How many turns (lines) do you have? Also, what game mode do you use?
I tried typing =lorem(100,100)
and pressing Enter in Microsoft Word. This will generate Lorem Ipsum wall of text, in size of about 420 Kb.
If I put it into Kobold Lite in Instruct mode, nothing bad happens. It laggs for a second or two, but then works fine.
However, if I will change my Instruct sequence to et
(literally it is contained in this text 5800 times), the tab crashes with Out of memory
as soon as I leave edit mode!
(In fact, now I'm stuck because it crashes after tab reload too; looks like I need to figure out how to erase the stored history)
The only obvious solution is to not store such long texts in Kobold Lite! Any web application will crash sooner or later, if it tries to render styled text in enormous amount.
However, if possible, Kobold Lite could stop matching the text if there are a lot of matches already, suggesting to copy the history away from here instead. Or ask a temporary confirmation to continue. This would solve the consequences, but fundamentally won't help in processing long chats.
If your WebUI just won't load anymore, you have 4 options.
Pick your poison. (And why do I keep saying "cache"? Because all the KoboldCPP's WebUI's data is indeed stored in cache. Not a single cookie is used.)
Actually, I've been suspecting that, the total amount of cache files, affects the performance as well.
Anyways, if you do believe, that having too many cache files also hurts the performance, Here's the details.
If your cache is corrupted, clearing browser storage for that site will usually fix it. There's also a "Reset all settings" button. Regarding large files, i will try to optimize it, but please understand that compromises have to be made - this is not just a normal text editor.
@WingFoxie, I cannot see your problems with browser cache. For me, no "4kb files" are created during streaming.
Can you tell your exact browser vendor and version, and the link to that NoScript
plugin which you are using?
In this case I'll try to reproduce your issue.
I'm just using the latest version of Firefox. You should be able to just reproduce it with a fresh new install, or a fresh new profile, without any plugins installed. With "token streaming" set to "on", of course.
Oh wait you are asking for a link to NoScript plugin? Just search "NoScript" in Firefox addon store. The first one in the search results, author name "Giorgio Maone".
Oh, and, since you are asking like this, you probably should create a separate profile in "about:profiles", and do the testing there!
I don't want you to install NoScript on the browser you usually use for everything, and then wonder "Why all the websites suddenly stopped working???". Since that's more or less the "supposed" behavior! To turn off all the scripts! And only turn one the ones that are absolutely necessary for you. And, of course, you should allow the WebUI's script to run in NoScript. And... I believe you should find the "editing long chat history" performance is "even worse".
Sorry that I couldn't verify this conveniently. But I believe that as long as you have that plugin installed, no matter how you configure it for your WebUI, the "Long lag when edit long text" problem will be magnified. If you couldn't reproduce this specific thing just forget it. Since I believe that "NoScript" doesn't create problems on it own, only magnifies the existing problems in the WebUI anyways.
And about the 10000s of cache files... The two problems I observed, is that the files spam a lot, and the WebUI never clear them up.
Okay, with Firefox I clearly see it creates cache entries for EACH /check request (1 per second) and stores this somewhere in C:\Users\<USER>\AppData\Local\Mozilla\Firefox\Profiles\<PROFILE>.default-release\cache2\entries\[0-9A-F]+
Here is the request that my browser makes:
POST /api/extra/generate/check HTTP/1.0
Host: localhost:5001
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:96.0) Gecko/20100101 Firefox/96.0
Accept: */*
Accept-Language: ru-RU,ru;q=0.8,en-US;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate
Referer: http://localhost:5001/
Content-Type: application/json
Origin: http://localhost:5001
Content-Length: 21
DNT: 1
Connection: keep-alive
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-origin
And here is the response headers from koboldcpp:
HTTP/1.0 200 OK
Server: ConcedoLlamaForKoboldServer
Date: Sun, 29 Oct 2023 12:01:57 GMT
access-control-allow-origin: *
access-control-allow-methods: *
access-control-allow-headers: *, Accept, Content-Type, Content-Length, Accept-Encoding, X-CSRF-Token, Client-Agent, X-Fields, Content-Type, Authorization, X-Requested-With, X-HTTP-Method-Override, apikey, genkey
content-type: application/json
I see here no Cache-Control parameter, which could change the default behavior for caching.
I decided to check, will it suffice to set it to no-store
?
I installed a simple programmable proxy server in NodeJS: https://github.com/http-party/node-http-proxy
This is the code that I tried:
// npm install http-proxy
var httpProxy = require('http-proxy');
var proxy = httpProxy.createProxyServer({
target:'http://localhost:5001',
selfHandleResponse : true,
}).listen(5002);
proxy.on('proxyRes', function (proxyRes, req, res) {
var body = [];
proxyRes.on('data', function (chunk) {
body.push(chunk);
});
proxyRes.on('end', function () {
body = Buffer.concat(body);
res.end(body);
});
proxyRes.headers['Cache-Control'] = 'no-store';
res.writeHead(200,proxyRes.headers);
});
proxy.on('error', function (err, req, res) {
res.writeHead(500,{
'Content-Type': 'text/plain'
});
res.end('500');
});
Basically, it opens local port 5002 and proxyfies all requests to port 5001, putting Cache-Control: no-store
to all responces.
And I see Firefox indeed stopped to store those pesky little files!
@WingFoxie, I suggest you to try this, provided you know how to run NodeJS; in essence, you need to install any LTS version (just be careful to not check "Chocolately" option in installer); and then create index.js
with the code I provided above; then run in console npm install http-proxy
+ node index.js
there.
@LostRuins, I suggest you to add Cache-Control: no-store
response header on all POST requests. (It is strange for me why Firefox caches that by default?)
Buy the way, instead of using Firefox (and noting that you don't want to use Chrome for whatever reason), a good way to play with local files is, surprisingly, Microsoft Edge (provided you aren't using it as your regular browser).
With a program like simplewall, which is a software firewall, you can disable internet access for Microsoft Edge (or disable for all processes except your allowed list, which I highly recommend to do on a computer that you own as the sole user), and then use it as fully local application for viewing HTML files and opening localhost servers as koboldcpp.
This way it is guaranteed that no data will be ever sent to internet by the browser itself. (Also I have Microsoft Edge as my default system browser – yes, without the access to internet – so that any accidental click on a link in any other application just don't open a random webpage leaking my IP or anything; but that's already an offtopic here)
Fixed in v1.48, cache control no store has been added. Please try.
Fixed in v1.48, cache control no store has been added. Please try.
I see Firefox indeed stopped caching results of streaming!
But I also tried new SSE mode, and it is not working in Firefox? Only in Chrome I see new type of requests and text appearance by character level at runtime. In Firefox there is no difference between On
and SEE
for Token Streaming
If SSE is not supposed to work in Firefox, that's fine, I just wanted to ask if that's intended (for example because FF lacking/bugging an API to do it properly without dirty hacks – which is unfortunately often the case with Firefox).
UPD:
WritableStream which was only added to firefox around version 100+.
Oh, I'm on 96, thank you!
Most likely your browser is too old. SSE streaming relies on WritableStream which was only added to firefox around version 100+. If that fails, it will fall back to chunky polled streams.
I am using koboldcpp-cuda 1.50-1 from AUR using yay https://aur.archlinux.org/packages/koboldcpp-cuda and I can reproduce the performance issues on Firefox.
I imported a 1.1MB story from my 0cc4m/KoboldAI installation and every time I edit a paragraph, the client freezes for 25 seconds on Firefox 120.0 While I am getting 16 T/s and token streaming works fine, editing is unusable. Undo, redo and saving world info takes 1-2 seconds which is tolerable.
I never had this major performance issues with 0cc4m's client, even the "New UI" works fine with this large story. On 0cc4m's client editing a paragraph is almost instant.
On chromium 119.0.6045.159 editing is much faster 1-2 seconds. Saving world info takes about same on both firefox and chromium.
Hi, are you able to run it with a performance profiler and see which javascript function is the culprit? If not, send me (DM or discord if you prefer) your 1.1mb story and I will try to take a look.
I am not sure what I am looking for in the profiler but here's what I found out. Functions set Element.innerHTML, merge_edit_field, onblur and EventHandlerNonNull ran for 30000ms parallel. render_gametext ran parallel for the last 9 seconds of that total 30 seconds
Firefox 2023-11-26 02.35 profile.json.gz saved_story(1).json
Okay, I can confirm this issue, thanks for letting me know. It will be fixed in the next version, I have already optimized the page in https://lite.koboldai.net you can try it there, it should be faster than before.
Hi all, this should be fixed in 1.51
Tried the 1.51.1 version. Doesn't feel like it's fixed at all.
Opening a 3.3MB txt file (Which probably weights 6.6MB in the save file) containing all the older chatlogs. Pasting all the text into an empty chat log. Hangs for 83 seconds. Then clicks away to finish edit. Hangs for another... maybe 5 minutes? I lost count.
Did more testing with a 200KB save file, and didn't see much of a difference compared to before. Each editing lags for 0.8 seconds.
Doesn't feel like it's fixed at all.
Is that in Firefox? Can you test the same setup but in Chrome, to be sure that it's related to the browser and not just "general performance issues" of large stories?
A few seconds of lag is normal for huge stories. The Firefox one used to be significantly slower than Chrome. before the fix - like 30s+, it's now closer to about 3-4s for a 1MB story.
Just realized that I did all these most recent testings in Firefox with NoScript ON! I believe now the NoScript plugin's magnifying effect on the edit time is gone. Then there's definitely some improvement with v1.51.
Also... Tried Chrome. Result's a mixed bag.
Tried the 200KB save file again on a Chrome private browsing tab, and it lags for like 0.5 seconds per edit. And it doesn't increase exponentially as the log size increases. But the pasting lag is much worse! Pasting the 800KB chatlog in Chrome makes me wait for over a minute. While pasting that much in Firefox only lags for ~4 seconds.
Looks like in Firefox there's exponentially worse delay on editing. In Chorme there's exponentially worse delay pasting the chatlog in. Both got the same problem of that "onblur" function got called too many times with each edit, leading to a trail of functions also being called too many times.
Which box did you test pasting in chrome? The regular UI inputbox, the chat ui one, or the main edit body window?
Does it matter in which mode (Story, Instruct, Chat, Adventure) you are pasting? Also, does Chrome hangs if you won't paste but open a JSON of saved story? (Asking just to clarify potential places why and if it could hang).
(Biggest discovery goes in the front.) I found out that, in Chrome. If I paste a bunch of text into a chatlog (in the "gametext" box) that's already having a bunch of text in the first place, the delay is crazy long! 6 minutes or something! But if I paste into a completely empty chatlog, even with even more text, it's only going to delay for 2 seconds or so. Try pasting 800KB text into an empty log, then another 800KB. And try pasting all 1.6MB at once. See the difference.
For example, dumping the existing half of log out into Notepad, pasting what you want to paste behind it in Notepad, then pasting all the text back into the chatlog at once! That dodges the 6-minute delay. Only in Chrome by the way. Not Firefox. Firefox doesn't suffer from this problem in the first place, and it's pasting time is more like a constant 4 seconds if I try pasting either way.
(Other answers below.)
The problem happens due to bottlenecks when reconstructing the HTML DOM after a paste, it's just simply not optimized for generating that many newlines in such huge texts. That is a bottleneck in chrome's execCommand insertHTML
and not my code.
I've added a small workaround if you try to SelectAll + Paste, it will wipe out the text first and that should result in a much faster paste. This issue is not solved though, if pasting into a window with existing text it will still be very slow depending on how much text you paste. If you paste a small amount of text, it wont be so slow.
The alternative is me inserting the text manually, which will be very fast. But then you would lose the ability to undo/redo that "paste". Which is better?
Sorry, I haven't looked at your code, but since I'm a javascript developer myself, I can give a general advice:
.innerHTML +=
, instead prepare the full string and set it only once. (You already know that)By the way: often when I paste something from external applications to the history box – everything is normal until I click outside (to message box or the send button) and after that the newly pasted text often produces double / unwanted newlines at random!
(As if \r\n
became \n\n
)
Sometimes a lot, sometimes none. But this happened only after pasting external text: if I'll then click in and edit something – if won't jumble anymore on its own.
Telling this because in the moment when it happens – I can feel a lag too, as if it hangs for a moment and then my newly pasted text jumps around with extra newlines.
insertAdjacentHTML
won't work because you are trying to replace part of the context in a content editable with new content from the clipboard. The only way that works and also allows it to be added to the browser's undo stack is execCommand
. This command itself is the source of the bottleneck.
if you'd like to give it a try, the relevant code that causes the lag is at https://github.com/LostRuins/lite.koboldai.net/blob/main/index.html#L3630
This code has one purpose - to strip out all the rich text content (e.g. images, bold text, hyperlinks) when pasting from clipboard, and render everything in a consistent way.
Setting insertText does not work, because contentEditable does not have consistent innerText behavior across browsers (some add more newlines than others due to extra <div>
s , we need to use hard breaks <br>
)
Manually rewriting/modifying the DOM does not work - we need all changes to be correctly added to the undo stack so you can paste stuff, then CTRL+Z or rightclick+undo to revert it.
Give it a try, maybe you have an alternative approach.
For a second, can you change document.execCommand("insertHTML", false, text);
to
document.execCommand("insertHTML", false, '');
document.execCommand("insertHTML", false, text);
and test performance? For me it feels like pasting becomes faster, but I'm not sure.
(If that works, you won't need fullySelected
check anymore)
Another bottleneck is autosave feature (when it is enabled), most importantly its compression, which is trigged on every "click-away" after the history was edited.
Actually, LMZA-JS supports asynchronous compression, and it is pretty much working as-is! Here are changes that I made:
function generate_compressed_story(save_images,export_settings,export_aesthetic_settings,opt_async) {
//encode the current story into a sharable url
//a tiny json format which gets compressed by LZMA then b64url
let story = generate_savefile(save_images,export_settings,export_aesthetic_settings);
let storyjson = JSON.stringify(story);
console.log("Exporting story: ", story, opt_async?"ASYNC":"");
//var cstoryjson = LZString.compressToEncodedURIComponent(storyjson);
if(opt_async){
lz_c.compress(storyjson, 1, function(res){
opt_async(buf_to_b64(res))
});
}else{
var cstoryjson = buf_to_b64(lz_c.compress(storyjson, 1));
return cstoryjson;
}
}
…
function autosave() {
//autosave
try {
update_prev_custom_endpoint_type();
localStorage.setItem(STORAGE_PREFIX + "settings", JSON.stringify(localsettings));
if (localsettings.persist_session) {
generate_compressed_story(true, true, true, function(compressedstory){
try{
localStorage.setItem(STORAGE_PREFIX + "story", compressedstory);
console.log("autosave done");
} catch (e) {
console.log("autosave failed: " + e);
}
});
}
} catch (e) {
console.log("autosave failed: " + e);
}
}
opt_async
last param that is populated only in autosave
call for generate_compressed_story
cstoryjson = buf_to_b64(lz_c.compress(storyjson, 1));
to lz_c.compress(storyjson, 1, function(res){ opt_async(buf_to_b64(res)) });
conditionallylocalStorage.setItem
is now inside the callback with additional try-catch block.In this setup, I see Exporting story:
in log, but autosave done
will be only around 10 seconds after it (for saved_story.1.json
1 megabyte file from this thread).
Not sure whether async localstorage write is always desirable; maybe you can change the autosave option to "on/async/off" rather than a checkbox?
Also I looked at your
function escapeHtml(unsafe)
{
return unsafe
.replace(/&/g, "&")
.replace(/</g, "<")
.replace(/>/g, ">")
.replace(/"/g, """)
.replace(/'/g, "'");
}
function unescapeHtml(input)
{
return input
.replace(/&/g, "&")
.replace(/</g, "<")
.replace(/>/g, ">")
.replace(/"/g, "\"")
.replace(/'/g, "\'");
}
And changed that to faster cached regular expressions:
var _escapeHtml_map_ = {
'&': '&',
'<': '<',
'>': '>',
'"': '"',
"'": ''',
};
var _unescapeHtml_map_ = {
'&' : '&',
'<' : '<',
'>' : '>',
'"': '"',
''': "'",
};
function _escapeHtml_replace_(unsafe) {
return _escapeHtml_map_[unsafe];
};
function _unescapeHtml_replace_(input) {
return _unescapeHtml_map_[input];
};
var _escapeHtml_regexp_ = /['<&>"]/g;
var _unescapeHtml_regexp_ = /&|<|>|"|'/g;
function escapeHtml(unsafe) {
return unsafe.replace(_escapeHtml_regexp_, _escapeHtml_replace_);
};
function unescapeHtml(input) {
return input.replace(_unescapeHtml_regexp_, _unescapeHtml_replace_);
};
I couldn't see any noticeable performance boots after this change, so it's up to you whether you want to take it or not. (For me it's just feels wrong to call .replace many times, especially on potentially longer strings)
P.S. Why do you hide the Settings button when .html is opened locally offline? There are some crucial options that affect local editing also, and desirable to have even when You will still be able to load and edit stories, but not generate new text.
The alternative is me inserting the text manually, which will be very fast. But then you would lose the ability to undo/redo that "paste".
If my trick in above message (with pasting empty string before actual modified content) gives nothing, we can try something else. At first, we would indeed need "you pasting manually" (confirming that performance would be good), and then we could either somehow trick the browser to keep its undo stack (by sending fake execCommand maybe?), or re-implement the stack itself.
I've found https://stackoverflow.com/a/67678889 but the code is really messy and has to be incorporated very carefully. I hope there would be a different approach rather than this particular one… I googled a lot, and the most promising trick was to create the new paste event (or modify an existing real one) and dispatch it manually as in https://stackoverflow.com/a/47753075, but looks like this is not working anymore at least in modern Chrome.
For me it feels like pasting becomes faster, but I'm not sure.
It does not, it's still horribly slow when pasting large text chunks into a large existing document.
cached regular expressions:
Nah, string literal replace is actually much faster than any sort of regex. Moreover, these operations are both extremely fast compared to DOM modification and would not be part of any bottleneck.
async saving
Async saving is possible, but needs more testing - not sure if its worth the hassle. The main concerns would be
Why do you hide the Settings button
Yeah I can leave it enabled, I guess.
clipboard hacks
They seem really clunky. I doubt they will be cross browser compatible.
It does not, it's still horribly slow
Sigh, welp.
Okay, how would you "paste manually"? I presume there would be many HTML elements in the history (colorization?), which could be partially selected and thus should be cut in the middle or somehow splitter before deletion and insertion (and then merging?)
Can you actually do that correctly? (In a separate branch to test further).
They seem really clunky. I doubt they will be cross browser compatible.
Also there are different "online text editors" out there, but I've checked them many years ago; probably now they are much better, and ideally should support arbitrary pasting along with undo history. We should find out how they have implemented it! I will check and return here to tell what I'll find.
Nah, string literal replace is actually much faster than any sort of regex.
Um-m… In this particular code, you are using regular expressions too! You're not doing .replaceAll (rather modern function that exists solely to NOT make a regular expression with /g
to replace all occurrences and not only the first one).
these operations are both extremely fast compared to DOM modification and would not be part of any bottleneck.
Yes, this is true. (I'm just always think from large-scale perspective, where you don't want tiny helper functions to be even theoretically slow, because later they can be used in tight loops).
I've benched two parts: the paste callback and the blur callback. In the former, the slowest call is indeed execCommand
, in the latter it was compress
(took more than a half of onblur execution time).
Race conditions, where an older save command could potentially complete after a new one
Oh, you're right! Okay, we can wrap a singleton, where each new async autosave call would invalidate any previous one (dropping the compressed result and not even doing base64 on it).
Also, there is a second "onprogress" callback of compress function, and we can throw
from there conditionally to effectively cancel the ongoing compression!
Background execution, which is not guaranteed
I've just benched it. The onprogress callback gets executed in 200-500 ms intervals normally in foreground, but may be throttled to 1000 ms when the user switches to another tab or application.
I think this can be solved by cancelling the ongoing async compression (throwing away partial work) and running the sync version right there instead, when the continuous throttling is detected (e.g. when more than 950 ms delay was 3 times in a row, allowing individual hiccups).
Here is how I would do that:
var _generate_compressed_story_async_ = 0;
var allow_async_autosave = true; // SHOULD BE IN SETTINGS
function generate_compressed_story(save_images,export_settings,export_aesthetic_settings,cb_async_autosave) {
//encode the current story into a sharable url
//a tiny json format which gets compressed by LZMA then b64url
let story = generate_savefile(save_images,export_settings,export_aesthetic_settings);
let storyjson = JSON.stringify(story);
console.log("Exporting story: ", story);
//var cstoryjson = LZString.compressToEncodedURIComponent(storyjson);
if(cb_async_autosave && allow_async_autosave){
var current_async = ++_generate_compressed_story_async_; // increment call counter
var last_timestamp = Date.now();
var throttle_counter = 0;
lz_c.compress(storyjson, 1, function(res){
if(!cb_async_autosave){ // callback was already called
return;
}
if(current_async!==_generate_compressed_story_async_){
cb_async_autosave(false); // this was not the latest autosave
}else{
cb_async_autosave(buf_to_b64(res));
}
},function(progress){
if(current_async!==_generate_compressed_story_async_){
var cb = cb_async_autosave; // abort older and return false
cb_async_autosave = null;
cb(false);
throw 'abort previous async autosave';
}
var new_timestamp = Date.now();
if(new_timestamp-last_timestamp>950){ // browser throttle of inactive tab
if(++throttle_counter>2){
console.log('browser throttle detected');
var cb = cb_async_autosave;
cb_async_autosave = null;
cb(buf_to_b64(lz_c.compress(storyjson, 1))); // resort to sync compression
throw 'cancel async autosave to sync';
}
}else{
throttle_counter = 0; // allow single hiccups
}
last_timestamp = new_timestamp;
});
}else{
var cstoryjson = buf_to_b64(lz_c.compress(storyjson, 1));
if(cb_async_autosave){ // when called as async but forbidden by settings
cb_async_autosave(cstoryjson);
}else{
return cstoryjson;
}
}
}
…
function autosave() {
//autosave
try {
update_prev_custom_endpoint_type();
localStorage.setItem(STORAGE_PREFIX + "settings", JSON.stringify(localsettings));
if (localsettings.persist_session) {
generate_compressed_story(true, true, true, function(compressedstory){
if(compressedstory===false){
console.log("older async autosave cancelled");
return;
}
try{
localStorage.setItem(STORAGE_PREFIX + "story", compressedstory);
console.log("autosave done");
} catch (e) {
console.log("autosave failed: " + e);
}
});
}
} catch (e) {
console.log("autosave failed: " + e);
}
}
I have tested it, it works reliably now. What do you think? UI is not hanging anymore when clicking outside of the large history after editing it.
Seems reasonable. I've added it for the autosaves as a new function, I'm leaving the manual export/save/load untouched as synchronous. Simplified the code a bit, let me know if this has any problems.
https://github.com/LostRuins/lite.koboldai.net/commit/21fbbb98705a55a1bef6ff43c6e1136078f7544e
I've added it for the autosaves as a new function
Good! A dedicated function is wise.
Simplified the code a bit, let me know if this has any problems.
Wait, wait, you are not throwing from onprogress callback? This leads to parallel redundant work that get dropped but wastes computational resources. For example, if I would click into the history to mash on keyboard and return to the input box and repeat – several autosaves are running simultaneously:
Autosave Start
Merged edit field
Autosave Start
Merged edit field
Autosave Start
Merged edit field
Autosave Start
Merged edit field
Autosave Rejected
Autosave Rejected
Autosave Rejected
Autosave Done
If we would go for aborting the previous one – neither might be done while the user keeps clicking… Okay, I have another solution, especially if you are not feeling right to throw exceptions:
var autosave_compressed_story_async_ongoing = 0;
function autosave_compressed_story_async(save_images,export_settings,export_aesthetic_settings) {
if(autosave_compressed_story_async_ongoing>0){
autosave_compressed_story_async_ongoing = 2;
console.log("Delay Autosave");
return;
}
let story = generate_savefile(save_images,export_settings,export_aesthetic_settings);
autosave_compressed_story_async_ongoing = 1;
let storyjson = JSON.stringify(story);
console.log("Autosave Start: ", story);
lz_c.compress(storyjson, 1, function(res){
console.log("Autosave Done");
var compressedstory = buf_to_b64(res);
localStorage.setItem(STORAGE_PREFIX + "story", compressedstory);
var repeat = autosave_compressed_story_async_ongoing>1;
autosave_compressed_story_async_ongoing = 0;
if(repeat){
autosave_compressed_story_async(save_images,export_settings,export_aesthetic_settings);
}
});
}
Now I have a global flag. If it is 0
then no other autosaves are running in parallel, so we start the first one and set the flag to 1
But if we see that this flag is already nonzero – we rewrite it to 2
and do nothing. This means, "the previous autosave is stale".
When the compressions finishes, it checks the flag. If it is 2
(and not 1
) – then somebody called autosave again during compression. So the current compressed result is outdated. But!
We won't just drop it; instead, we save it into the local storage as normal – and repeat the autosave again! (Resetting the flag back to 0
in either case)
Since we don't have to repeat compression with the exact same story as was requested in the last autosave call (because we can just compose a new fresh one) – all redundant autosave calls are cheap: all they do is "please, redo the autosave after the current one if you have any".
This solves all three problems:
The above log becomes:
Autosave Start
Merged edit field
Delay Autosave
Merged edit field
Delay Autosave
Merged edit field
Delay Autosave
Merged edit field
Autosave Done
Autosave Start
Autosave Done
Returning to onpaste hell… Well, I've checked several online editors like
– Seems like neither of them can really restore the undo stack. No, all of them do have "undo/redo" functionality (often on Ctrl+Z
, Ctrl+Y
or Ctrl+shift+Z
) but if you right-click and choose "Undo" in the browser native context menu, they either:
There was one that could really undo: https://alex-d.github.io/Trumbowyg/
But I think they are still calling execCommand('insertText')
somewhere…
Also there was this one: https://sofish.github.io/pen/ And it's interesting! Looks like they are "just pasting as it is" but then cleaning whatever they don't support:
// listen for paste and clear style
addListener(ctx, editor, 'paste', function() {
setTimeout(function() {
ctx.cleanContent();
});
});
…
// remove attrs and tags
// pen.cleanContent({cleanAttrs: ['style'], cleanTags: ['id']})
Pen.prototype.cleanContent = function(options) {
var editor = this.config.editor;
if (!options) options = this.config;
utils.forEach(options.cleanAttrs, function (attr) {
utils.forEach(editor.querySelectorAll('[' + attr + ']'), function(item) {
item.removeAttribute(attr);
}, true);
}, true);
utils.forEach(options.cleanTags, function (tag) {
utils.forEach(editor.querySelectorAll(tag), function(item) {
item.parentNode.removeChild(item);
}, true);
}, true);
checkPlaceholder(this);
this.checkContentChange();
return this;
};
– Can you do that? Traverse internals of history box and strip anything that should not be there. Since you know the cursor position/selection, in theory you should be able to find the place "that should have been modified by the native pasting" and convert HTML to text-only nodes?
Also, I tried to apply my own approach to "clean the clipboard" like this: (can be pasted directly in browser console at runtime)
var paste_fake_textarea_elem = null;
function paste_fake_textarea(cb){
var textarea = paste_fake_textarea_elem;
if(!textarea){
textarea = document.createElement("textarea");
textarea.wrap = "off";
textarea.spellcheck = false;
textarea.style = "position:fixed;left:-99px;top:-99px;width:9px;height:9px";
document.body.appendChild(textarea);
paste_fake_textarea_elem = textarea;
}
textarea.style.display = "";
var selection = window.getSelection();
var saved = [selection.anchorNode, selection.anchorOffset, selection.focusNode, selection.focusOffset];
textarea.focus();
textarea.oninput = function(){
textarea.oninput = null;
var text = textarea.value;
textarea.style.display = "none";
textarea.value = "";
var selection = window.getSelection();
selection.setBaseAndExtent(saved[0], saved[1], saved[2], saved[3]);
cb(text);
};
};
document.addEventListener("paste",function(e){
paste_fake_textarea(function(text){
setTimeout(function(){
console.log("insertText...");
var time = Date.now();
document.execCommand("insertText", false, text);
console.log(Date.now()-time);
},0);
});
},true);
– This does not work as I expected. I am creating a fake off-screen textarea just to redirect the native pasting there – to grab the cleaned text-only version of clipboard contents. This gave me nothing useful (and it also interferes with onblur in your code, and prone to race conditions but let's put that aside for now).
Having the cleaned text, how do we paste it preserving the stack? You said that you cannot use insertText
for whatever reasons (and had to use insertHTML
), but I decided to actually try to bench it…
Complete failure! Copying large parts of the history and pasting them back gives an awful delay on document.execCommand("insertText", false, text);
too.
The editor that I had used years ago was https://quilljs.com/ but apparently they are intercepting Ctrl+Z
key manually somewhere. In the end, maybe this is the way to go?
throwing from progress callback
Throwing exceptions to halt an existing save-in-progress doesn't feel right, in this case function lz_c.compress
seems stateless and harmless enough but this kind of behavior can often lead to resource leaks/inconsistent state when the libraries are not expecting it (e.g. memory allocated is not freed when interrupted unexpectedly, opened file handles not closed) so I try to avoid it.
Okay, I have another solution, global flag
I'm not so sure about this one. Calling the autosave function recursively seems kind of dubious, especially since autosave_compressed_story_async_ongoing
is a global variable that can have it's state altered and reset externally, something you really want to avoid with recursive method calls.
Even assuming it works without issues, the second save basically triggers automatically at an unknown time in the future (maybe they are halfway modifying some settings/editing something?), rather than with a "known good" state when triggered normally (e.g end of edit, end of generate action, closing settings window).
I think that overall, the total the number of saves has not changed compared to the synchronous approach, just that from the user's POV it now doesn't freeze when saving, which is all that's needed in the end.
intercepting Ctrl+Z key manually somewhere
Won't work on mobile, won't work with right-click + undo, won't work for people with non-default keymaps.
You said that you cannot use insertText for whatever reasons
One problem with insertText is some browsers handle it differently, for newlines for example some add <div>
s and some add <br>
and some add <div><br></div>
which is a real headache when parsing it back to store, you end up with extra or fewer newlines.
this kind of behavior can often lead to resource leaks/inconsistent state when the libraries are not expecting it
I see your point. I can reply that if a library breaks badly because of exceptions in callbacks – then an Issue should be filed to its developers. Or, in specific cases where is not feasible to catch it gracefully there by performance reasons – the documentation of the library should tell it clearly that exception leaking is forbidden (so the user should try-catch on his own if needed). But, yeah, I hadn't read their documentation thoroughly, so I hadn't have rights to throw like that ))
a global variable that can have it's state altered and reset externally
By using let
(that is present in your code too) it is possible to closure a "function-local static variable" like this:
{
// new scope for let/const
let local_var;
function global_func(new_value) {
var old_value = local_var;
local_var = new_value;
return old_value;
};
}
// the function is visible outside:
console.log(global_func(1)); // undefined
console.log(global_func('test')); // 1
console.log(global_func(true)); // "test"
// but the variable is not:
console.log(local_var); // Uncaught ReferenceError: local_var is not defined
Personally, I would rather abuse the fact that a function is a dynamic object for which it is possible to add properties like global_func._ local_var = new_value
, but that still can be accessed outside on purpose (useful for debugging, for example).
In case of let
the variable is fully private. The only caveat is that the function in the block won't be directly available before it's declaration (above the block it is undefined
until the execution reaches it the first time), but that won't be a problem for your code.
the second save basically triggers automatically at an unknown time in the future
In the previous version that you've merged, the save is also triggers at unknown time in the future, but with known value. It still can rewrite localStorage in-between something (but I doubt your code is sensitive to that).
rather than with a "known good" state when triggered normally
All right, we can produce JSON and stringify it before post-ponding the compression. Would this suffice you?
I think that overall, the total the number of saves has not changed compared to the synchronous approach
This is wrong: previously, the user physically could not save more often than possible, because UI hangs and cannot process onblur anymore. Now each onblur after any edit triggers a new parallel compression, making the UI slower and previous compressions to finish late. The more text in history, the more serious this problem would be!
Won't work on mobile, won't work with right-click + undo, won't work for people with non-default keymaps.
Then somebody should test, how those other editors behave on mobiles and non-default keymaps. (The right-click undo is out of question, unfortunately).
Meanwhile, I'll try to come up with a function to "clean-out" any pasted contents for your history box. For now, this is my updated version of autosave, what do you think this time?
{
let new_storyjson = null;
function autosave_compressed_story_async(save_images,export_settings,export_aesthetic_settings) {
let story = generate_savefile(save_images,export_settings,export_aesthetic_settings);
let storyjson = JSON.stringify(story);
let ongoing = new_storyjson;
new_storyjson = storyjson;
if(ongoing){
console.log("Delay Autosave: ",story);
return;
}
console.log("Autosave Start: ", story);
(function retry(json) {
lz_c.compress(json, 1, function(res) {
console.log("Autosave Done");
let compressedstory = buf_to_b64(res);
localStorage.setItem(STORAGE_PREFIX + "story", compressedstory);
let newer = new_storyjson;
new_storyjson = null;
if (newer && newer !== json) {
console.log("Updating Autosave");
retry(newer);
}
});
})(storyjson);
}
};
Your new save approach sounds correct but it is failing and losing data for some reason in practice when I test it. There must be some race condition that causes it to fail.
Try this test:
Now try your version:
Confirmed.
Look:
console.log("Autosave Done: ", res.length);
After initial loading of a large story, it prints
Autosave Done: 2525
And after some time, it prints
Autosave Done: 319640
I presume for some reason, your code returning an "empty" story when it loads the first time!
Since my code is "saving the first one, then saving the last one", it saves the empty.
Looking at console even in yours version, I see
memory: ""
prompt: ""
In Autosave Start:
, but populated properly in subsequent logs.
I logged gametext_arr
in generate_savefile
and it prints []
on the first time the tab loads.
What we will do about it? I presume, the solution "just drop an empty story" is not a wise one, because this way the user wouldn't have the ability to "clear" the history completely (but maybe this is a good thing?) Also, "just drop the first one" is very incorrect either.
generate_savefile
should not return bad data in the first place!
I would presume this happens anytime the UI starts from a new story, which happens on startup but also in multiple other places. It's not "bad" data, this doesn't cause any issues in both the synchronous call and my current async approach.
Maybe the browser is smart enough to allow existing executing functions to finish executing completely before the process is halted (e.g. user closed tab, user refresh page) however, in the new approach the browser has no way of knowing there is incomplete tasks queued when it faces unexpected termination.
I can try to avoid the autosave on restart story, but i feel like this is a band aid that may mask other issues with the non-immediate approach. Maybe it would be safer to keep the save/load stuff also using sync save, and leave autosave only for the runtime editing and generation
It's not "bad" data, this doesn't cause any issues in both the synchronous call and my current async approach.
In synchronous version, you are just saving twice in a row, rewriting the incorrect one right away. In current asynchronous version, the second call drops the first one (and the save file is not rewritten in case of tab reload).
You should not alter the saved state on loading!
Okay pushed a fix for this and integrated it into lite, hopefully it doesn't cause issues. Please check.
Okay pushed a fix for this and integrated it into lite
Nah-ah, you have a subtle bug there!
(function retry_autosave(json) {
lz_c.compress(json, 1, function(res) {
console.log("Autosave Done");
let compressedstory = buf_to_b64(res);
localStorage.setItem(STORAGE_PREFIX + "story", compressedstory);
let newer = pending_storyjson_autosave;
pending_storyjson_autosave = null;
if (newer && newer !== json) {
console.log("Updating Autosave");
retry_autosave(newer);
}
});
})(storyjson);
– Here, you are nulling pending_storyjson_autosave
and THEN you check whether you need to retry; but you are retrying it right away. Thus, pending_storyjson_autosave
will be null
even while the story is compressing.
Any subsequent outer call won't see that, and will ultimately start a second concurrent compression, leading to race condition! Can be confirmed in console by constantly editing the history and making sure there was an edit after the moment when the first compression finishes:
Autosave Done
Updating Autosave
Autosave Done
Autosave Done
(The last line might be stacked by the browser to (2) Autosave Done
)
And also you are not logging anything in if(!async_save)
branch; I suggest putting there a debug dump too, just to be sure about the cause of noticeable lag when that happens.
Here is the final fixed code:
//runs async, complete autosave only if latest to be called
var pending_storyjson_autosave = null;
function autosave_compressed_story(async_save,save_images,export_settings,export_aesthetic_settings) {
let story = generate_savefile(save_images,export_settings,export_aesthetic_settings);
let storyjson = JSON.stringify(story);
if(!async_save)
{
console.log("Delay Sync: ", story);
var cstoryjson = buf_to_b64(lz_c.compress(storyjson, 1));
pending_storyjson_autosave = null;
localStorage.setItem(STORAGE_PREFIX + "story", cstoryjson);
return;
}
let ongoing = pending_storyjson_autosave;
pending_storyjson_autosave = storyjson;
if(ongoing){
console.log("Delay Autosave: ", story);
return;
}
console.log("Autosave Start: ", story);
(function retry_autosave(json) {
lz_c.compress(json, 1, function(res) {
console.log("Autosave Done");
let compressedstory = buf_to_b64(res);
localStorage.setItem(STORAGE_PREFIX + "story", compressedstory);
let newer = pending_storyjson_autosave;
if (newer && newer !== json) {
console.log("Updating Autosave");
retry_autosave(newer);
}else{
pending_storyjson_autosave = null;
}
});
})(storyjson);
}
Hm-m, have you considered what would happen when SYNC autosave would be triggered during an async one, for whatever reason? Do you have sensitive places in your code?
One simple solution could be:
pending_storyjson_autosave
and exit doing nothingWanna me wrap it up?
Every time KoboldCPP "finishes an edit to the chat log" everything hangs a bit. Every 500KB of "save file size" increase, introduces another "1 second" of additional delay. Which adds up pretty quickly. Since I've been a heavy user recently, my save file increases about 1000KB each week! The delay quickly becomes unbearable.
To reproduce: Simply make a very long chat log, with save file size of a few MBs, and make the simplest edit, and click away for the edit to register. Or you just click the "Settings" button, and click "Cancel" button without changing anything at all! (But why?)
Oh and if you notice not just 1 sec delay each 500KB, but 10sec delay each 500KB, that's because you have NoScript plugin installed. (But why??) I had to create a separate browser profile, without that plugin installed, dedicated for this WebUI, to make the pain at least a bit manageable.
Also, this bug doesn't only affect edit mode. Since all modes requires KoboldCPP to "finish an edit to the chat log" sometimes, e.g. token streaming, or end sequence trimming. You can notice that that token streaming get much less frequently updated with huge chat logs. And it eventually creates a soft bottleneck, lowering the GPU utilization as well, simply because everything else prefers waiting for the edit to finish!
(Above is the second draft.)
(First draft below, in case anything in it still is helpful.)
Title: Every time finish editing the text, UI hangs for more than 10 seconds, with save file as large as 1MB How to reproduce: 1. Have a very long "chat" history. For example, save file size exceeds 1MB, and over 3000 turns. 2. Enter edit mode, edit a bit. (Add an "a" in the end or something) 3. Click away to register the edit. Now you should see the UI hangs for a decent amount of time. Should be >1 second, >10 seconds if you have NoScript plugin installed. (Using Firefox browser here.) I use edit all the time, and the hanging will only get longer as the chat history gets longer. (Also I notice that clicking the "OK" button in the settings UI should also cause a very long delay at this point. You don't even need to change any settings, just click "settings" -> click "OK" -> hangs a second or two.) (Also if you are already reproducing the bug you may stop reading the rest. Those are just additional info which gets increasingly useless as it progresses. Just wanted to make sure that I provided all the might-be related info.) Workarounds: 1. Uninstall NoScript and browse the whole Internet without it, just for this app. (Scratch this) 1. Create a separate Firefox browser profile without NoScript installed, just for this app. Will reduce the hang time to a sort of acceptable level. 2. Also, just click the "submit button" whenever you want, despite it being unavailable, the click will register after the hanging is over. Can even alt-tab away before that happens, and the click will still register. Tried these but won't help: 1. Choose story mode instead of chat mode. (Doesn't seem to work, also creates its own problems) 2. Stop using that Stylus script that helps me recolor the "User:" and "AI:" strings to the colors I want. (That script doesn't slow things down) 3. Stop using Dark Reader plugin to help recolor the UI. (That doesn't slow things down either) Things I didn't try: 1. Use Chrome instead of Firefox. (Don't want to) 2. Use Edge, Opera, Brave... instead of Firefox. (Don't want to) 3. Use a good offline web browser that's specifically made for local URLs, and will definitely not connect to the Internet by itself. (Does such thing even exist?) 4. Just don't edit (No way) 5. Cut off the first 80% of the chat history, save them elsewhere, and use the rest. (Will definitely work but... I was jumping ship from "KoboldAI/KoboldAI-Client" to this "LostRuins/koboldcpp/". And with "KoboldAI/KoboldAI-Client" the editing feature worked without any delay at all. So there has to be an optimization problem somewhere) What version am I currently using? How do I check that? I can't see version number anywhere. Checks the WebUI, nothing. Right click Koboldcpp.exe -> properties, nothing. It's 1.45.2 by the way. I know this only because I installed the "latest version" 3 days ago, so it must be that version. Or check the console output, but I was running it for so long, input so much, that the very first output is already washed away by all the other outputs, even with the 9001 line history on Windows Console. And I don't want to launch it again, otherwise I'll have to click "Allow Editing" for one more time (#190). Then I realize, I can just run koboldcpp.exe without using .bat and .kcpp files to jump straight up to the WebUI, and without having to close the existing instance before that. Then I'll see the version number in the launch settings UI. Without having to launch my already running session once more, and click "Allow Editing" one more time.