LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
5.09k stars 354 forks source link

The longer the chat log, the slower the program. Problem is at "Edit" feature, but all modes are affected. #492

Open WingFoxie opened 12 months ago

WingFoxie commented 12 months ago

Every time KoboldCPP "finishes an edit to the chat log" everything hangs a bit. Every 500KB of "save file size" increase, introduces another "1 second" of additional delay. Which adds up pretty quickly. Since I've been a heavy user recently, my save file increases about 1000KB each week! The delay quickly becomes unbearable.

To reproduce: Simply make a very long chat log, with save file size of a few MBs, and make the simplest edit, and click away for the edit to register. Or you just click the "Settings" button, and click "Cancel" button without changing anything at all! (But why?)

Oh and if you notice not just 1 sec delay each 500KB, but 10sec delay each 500KB, that's because you have NoScript plugin installed. (But why??) I had to create a separate browser profile, without that plugin installed, dedicated for this WebUI, to make the pain at least a bit manageable.

Also, this bug doesn't only affect edit mode. Since all modes requires KoboldCPP to "finish an edit to the chat log" sometimes, e.g. token streaming, or end sequence trimming. You can notice that that token streaming get much less frequently updated with huge chat logs. And it eventually creates a soft bottleneck, lowering the GPU utilization as well, simply because everything else prefers waiting for the edit to finish!

(Above is the second draft.)

(First draft below, in case anything in it still is helpful.) Title: Every time finish editing the text, UI hangs for more than 10 seconds, with save file as large as 1MB How to reproduce: 1. Have a very long "chat" history. For example, save file size exceeds 1MB, and over 3000 turns. 2. Enter edit mode, edit a bit. (Add an "a" in the end or something) 3. Click away to register the edit. Now you should see the UI hangs for a decent amount of time. Should be >1 second, >10 seconds if you have NoScript plugin installed. (Using Firefox browser here.) I use edit all the time, and the hanging will only get longer as the chat history gets longer. (Also I notice that clicking the "OK" button in the settings UI should also cause a very long delay at this point. You don't even need to change any settings, just click "settings" -> click "OK" -> hangs a second or two.) (Also if you are already reproducing the bug you may stop reading the rest. Those are just additional info which gets increasingly useless as it progresses. Just wanted to make sure that I provided all the might-be related info.) Workarounds: 1. Uninstall NoScript and browse the whole Internet without it, just for this app. (Scratch this) 1. Create a separate Firefox browser profile without NoScript installed, just for this app. Will reduce the hang time to a sort of acceptable level. 2. Also, just click the "submit button" whenever you want, despite it being unavailable, the click will register after the hanging is over. Can even alt-tab away before that happens, and the click will still register. Tried these but won't help: 1. Choose story mode instead of chat mode. (Doesn't seem to work, also creates its own problems) 2. Stop using that Stylus script that helps me recolor the "User:" and "AI:" strings to the colors I want. (That script doesn't slow things down) 3. Stop using Dark Reader plugin to help recolor the UI. (That doesn't slow things down either) Things I didn't try: 1. Use Chrome instead of Firefox. (Don't want to) 2. Use Edge, Opera, Brave... instead of Firefox. (Don't want to) 3. Use a good offline web browser that's specifically made for local URLs, and will definitely not connect to the Internet by itself. (Does such thing even exist?) 4. Just don't edit (No way) 5. Cut off the first 80% of the chat history, save them elsewhere, and use the rest. (Will definitely work but... I was jumping ship from "KoboldAI/KoboldAI-Client" to this "LostRuins/koboldcpp/". And with "KoboldAI/KoboldAI-Client" the editing feature worked without any delay at all. So there has to be an optimization problem somewhere) What version am I currently using? How do I check that? I can't see version number anywhere. Checks the WebUI, nothing. Right click Koboldcpp.exe -> properties, nothing. It's 1.45.2 by the way. I know this only because I installed the "latest version" 3 days ago, so it must be that version. Or check the console output, but I was running it for so long, input so much, that the very first output is already washed away by all the other outputs, even with the 9001 line history on Windows Console. And I don't want to launch it again, otherwise I'll have to click "Allow Editing" for one more time (#190). Then I realize, I can just run koboldcpp.exe without using .bat and .kcpp files to jump straight up to the WebUI, and without having to close the existing instance before that. Then I'll see the version number in the launch settings UI. Without having to launch my already running session once more, and click "Allow Editing" one more time.
LostRuins commented 12 months ago

Hmm the editor wasn't really designed for stories that long. It might just be the overhead required to compress and save the story. Can you try disabling the "Autosave Session" and "Embed Settings File" and see if that helps?

image

LostRuins commented 12 months ago

For reference, I have a 500kb story in chrome, which is a few hundred turns, and saving an edit takes less than 1 second.

WingFoxie commented 12 months ago

It's not just "a problem only with insanely long chat logs". You really need to fix this somehow. It's unacceptable that the response time gradually slows down, just because of all the text that are out of the context length, and won't be sampled whatsoever. The delay is not that difficult to notice either. It becomes noticeable after the save size gets only 200KB or so. Which is like... I can do this in half a day. For starters, if the app can automatically only keep a close eye on enough lines to fill the context window, and won't keep processing the rest of the whole chat log all the time. That would be great. Especially don't keep processing all the chat log when token streaming!

About the settings you asked me: On my currently 3.0MB save file: With all settings in question OFF, editing takes 1 sec to finish. "Embed Settings File" does nothing. "Autosave Session" adds 3 sec.

Yet, "Token Streaming" adds an insane amount of delay! +200% time on each response! An 25 tokens response, which is supposed to only take 10 seconds to generate, will take 30 seconds to generate! Since this basically inserts multiple "edits". (By 30 seconds I mean, from the moment I press Enter to "Submit", to the moment the text cursor blinks again.)

(BTW no matter how long the app hangs when editing, saving, etc. I only see a 20% CPU utilization spike. Not a single CPU core, or anything, exceeds 50% usage when it's only busy editing or saving. Only high usage when it's actually sampling and generating tokens. Utilization problem maybe?)

(End of second draft)

First draft: Just tried to load up all my chat logs again. (They are now already 3.0MBs in save size by the way.) Here's what I've found: Forget about the "Save File Incl. Settings" (I mean "Embed Settings File") setting, it doesn't have a effect on the delay. The "Persist Autosave Session" (Or "Autosave Session") setting does add delay. With it on the delay is like 4 seconds. (+3 seconds on my 3.0MB save file.) Without it, edit delay is like 1 second. If you want to ask me why now a 3.0MB save only has 4 seconds delay for me now, instead of 6 seconds. I don't know either. Maybe because both have been relaunched since the original post? I did restart Firefox to update it, and the CMD Window crashed at some point as well. Also, seriously: Please try with larger file size. Just copy that 500KB chat log of yours, in the WebUI's edit mode, and copy paste it a few times right there, to create a larger file. Please notice that how the "token streaming" gets insanely slow as well. At first it's like real time, later it soon becomes 10 words each time it updates, and much slower as well. Please also notice the increased time on the trimming process. At first the trimming happens so fast, And it's not that hard to notice the slowness as the size of the chat log increases, it becomes noticeable after the save size gets only 100KB or so. Can tell that it's clearly not as snappy as when my chat log just fulfills the context length. And can tell clearly that the text is streamed much less frequently, from about 1 word each time to 2~3 words each time. If I chat on that 3.0MB save. The response needs 30 seconds to finish. (From when I press enter, to when the text cursor blinks again.) If I leave only the last 500KB or so, and try again. It only takes 15 seconds to generate a slightly longer response. If I remove anything that won't be sampled due to my context length settings. Then it's like 12 seconds. to generate a similar lengthed response. See how it hurts? It's like the top 1 thing to improve for a better response time, is this edit thing already. Of course I can turn off a bunch of things, like token streaming, end sequence trimming. But... You really need to fix this somehow. Because currently the response time gradually slows down just because of all the text that are out of the context length, and won't be sampled whatsoever. For starters, if the app can automatically only keep a close eye on enough lines to fill the context window, and won't keep processing the rest of the whole chat log all the time. That would be great. Especially don't keep processing all the chat log when token streaming! (BTW what do you mean by "the overhead required to compress and save the story". Because no matter how slow the app behaves when editing, saving, etc. I only see a 20% CPU utilization spike. Utilization problem maybe?)
LostRuins commented 12 months ago

What version of koboldcpp are you using?

WingFoxie commented 11 months ago

The latest version.

Details: Was using 1.45.2 at first. Noticed that later version's changelogs doesn't mention anything about improving edit performance anyways. After you explicitly asked, I tried the current latest version which is 1.47.2. Despite the fact there are tons of changes across these two versions, editing performance isn't one of them. After upgrading to 1.47.2. It tried that 3.0MB file again, one response still needs 30 secs to generate. If I remove all the earlier text that won't be sampled anyways, it takes about 10 secs. Tried fiddling all the settings as well. And I did notice there's a new option for "token streaming". I tried all of them, same behavior as before.

Let me just put it this way. If this problem never gets fixed. I have to stick to using a couple of workarounds:

Details 1. Use a separate browser profile to avoid using NoScript with the WebUI. 2. (Important) Every once in a while, copy all earlier chatlogs, from the WebUI, to a separate txt file. And try to get used to it. 3. If still not enough, may turn off "token streaming". I still left it on. 4. (Minor) Configure KoboldCPP to not launch browser automatically after loading up, since it won't open the WebUI in the profile I created. Maybe even create a .bat file to launch the browser in that exact profile after loading KoboldCPP with the desired settings.

Obviously I can't single-handedly fix this. All I can figure out all by myself is that: When editing a single bit on a long chat log... some functions are executed way to many times, like 10000 times.

Details It was a 725KB (save file size) log, with NoScript installed on the browser, to magnify the lag. In about 20 seconds... "render_gametext" function is called 7000 times. "merge_edit_field" function is called 12000 times. "render_gametext" function is called 7000 times. "onblur" called 12000 times. "apply" 5500 times. "autosave" 1300 times.

But I don't know what to do from here. But if there's still anything else I can do to help me help you. I'll try.

LostRuins commented 11 months ago

what do you mean "render_gametext" function is called 7000 times. are you saying that's triggered every time you edit? how are you getting that count?

WingFoxie commented 11 months ago

Just press F12 to bring up the Browser's developer features, should have a tab called "performance" or something, which should have a pretty obvious button in it, which allows you to record performance and analyze. I just started recording right before I make a tiny edit to the chat log. Then I edit, and wait for 10 seconds until it finally finishes, then removes that tiny edit, and wait another 10 seconds, and stop the recording and see the captured records called a "performance profile". And then I try my best to pretend that I know how to read it. I'm not a developer at all, after all.

By contrast, I did another recording, with only a few screens of text in the chat log. And I record editing the text once with the UI feeling snappy and fast. Those "methods", or whatever they are, only got called for about 60~80 times, each. The "render_gametext" was called 63 times, which clearly indicates a problem not yet causing enough total lag to be noticed.

If you want to try it, I believe chrome also has a similar feature, which should also be under the development UI, under a tab about "performance", and also has a pretty obvious button there. Just, make sure that you choose the right preset for recording. Make sure that you are actually recording the performance of "the web page", not "the Browser itself" or anything else!

aleksusklim commented 11 months ago

How many turns (lines) do you have? Also, what game mode do you use?

aleksusklim commented 11 months ago

I tried typing =lorem(100,100) and pressing Enter in Microsoft Word. This will generate Lorem Ipsum wall of text, in size of about 420 Kb. If I put it into Kobold Lite in Instruct mode, nothing bad happens. It laggs for a second or two, but then works fine.

However, if I will change my Instruct sequence to et (literally it is contained in this text 5800 times), the tab crashes with Out of memory as soon as I leave edit mode! (In fact, now I'm stuck because it crashes after tab reload too; looks like I need to figure out how to erase the stored history)

The only obvious solution is to not store such long texts in Kobold Lite! Any web application will crash sooner or later, if it tries to render styled text in enormous amount.

However, if possible, Kobold Lite could stop matching the text if there are a lot of matches already, suggesting to copy the history away from here instead. Or ask a temporary confirmation to continue. This would solve the consequences, but fundamentally won't help in processing long chats.

WingFoxie commented 11 months ago

If your WebUI just won't load anymore, you have 4 options.

  1. Edit: Turns out I'm an idiot. It's actually very easy to clear cache for just one website. It's just in the browser's settings, at the same place where you can clear cookies for just one website. You just have to clear both cookies and caches for one website at the same time, no options to clear only one of them. That wouldn't matter in this case though. 1. Purge your browser "cache" in less then one minute. But this will get rid of all cache, not just the WebUI's cache. 2. Use a tool to list all the cache, and only delete the ones that are related to the WebUI, and regret doing this, because you need to wait for 20 minutes to just load the list!
  2. Use a separate browser profile like I do, only for this WebUI, that way you can just purge all cache in that profile in less than a minute. And, enjoy a unique problem caused by it. (When a browser's whole cache is dedicated for just this WebUI, nothing stops it from keeping as many cache files, from so many, up to as many as the amount of tokens you ever, ever generated in total. Which can bank up to an insane amount.)
  3. Use Private browsing window for now. And make sure you manually saved.

Pick your poison. (And why do I keep saying "cache"? Because all the KoboldCPP's WebUI's data is indeed stored in cache. Not a single cookie is used.)

Actually, I've been suspecting that, the total amount of cache files, affects the performance as well.

Here's how. And how I decided to not report about it back then. _Once, when I keep almost no text in the WebUI at all, but keeping 60000 cache files it generated, the generation is still insanely slow! Purging the cache, and it gets fast as usual again._ _It's just that. I can't really confirm it. Since that one day, things went wrong in like 4 different ways, to the point I can't even fix everything by reverting the the older version and config I used in the last two weeks, and to the point I was worried that my GPU is dying (No it's not.). Even now this AI keeps generating 8 tokens at a time, repeatedly, until it generated 100, instead just generating the 100 tokens I want it to generate in one go. And I don't know how to fix it._

Anyways, if you do believe, that having too many cache files also hurts the performance, Here's the details.

A post I wrote but didn't post due to me questioning everything at that point. Just find out another huge thing! There's another thing that hurts the performance over time. Other then the length of the chat log. The cache files that the WebUI created by itself. Upon every state of "token streaming", the generated part keeps getting saved on browser cache. It seems that the WebUI saves another file, every time a bit more token is streamed. Below is an example of the streaming text just changed from " goes", to " goes to the bathroom to wash", which the WebUI creates a new file for. ()
Example of one of the files content. (I did show the exact content. Didn't intentionally crupt any part of it.) ``` {"results": [{"text": " goes to the bathroom to wash"}]}权咂癞  e;E閑;E镕+6e;E? h O^partitionKey=%28http%2Clocalhost%2C5001%29,~1698382684,:http://localhost:5001/api/extra/generate/check necko:classified 1 strongly-framed 0 request-method POST response-head HTTP/1.0 200 OK Server: ConcedoLlamaForKoboldServer Date: Fri, 27 Oct 2023 05:08:57 GMT access-control-allow-origin: * access-control-allow-methods: * access-control-allow-headers: *, Accept, Content-Type, Content-Length, Accept-Encoding, X-CSRF-Token, Client-Agent, X-Fields, Content-Type, Authorization, X-Requested-With, X-HTTP-Method-Override, apikey, genkey content-type: application/json original-response-headers Server: ConcedoLlamaForKoboldServer Date: Fri, 27 Oct 2023 05:08:57 GMT access-control-allow-origin: * access-control-allow-methods: * access-control-allow-headers: *, Accept, Content-Type, Content-Length, Accept-Encoding, X-CSRF-Token, Client-Agent, X-Fields, Content-Type, Authorization, X-Requested-With, X-HTTP-Method-Override, apikey, genkey content-type: application/json ctid 1 net-response-time-onstart 4 net-response-time-onstop 4 8 ```
Each file like this is about more then 1KB in size, which takes 4KB on my hard drive. And it somehow slows the performance as well! Simply by keeping too many of them. And I saved up over 60000 of them. To the point that even if I don't have a long chat log at all. The generation is super slow. Clearing the cache makes it snappy again.
More, more details. How slow it is exactly. How is 60000 files considered too much. And why you may never save up that many files if you also use it all the time like I do. _At one point the generation is so slow, that even if I only keep content-length worth of chat log. A response still takes 40 seconds to generate!_ _And if I "save" first, so I can "clean all the browser cache kobold webUI generated", and load the save and continue, it becomes snappy again!_ _And right before clearing, there are over 60000 cache files already, all generated by the WebUI! I used a separate browser profile for this WebUI remember? And it's the only URL that profile uses._ _I believe this can be achieved simply by... using it way too much._ _(Did notice this problem earlier, just didn't think it is worth reporting._ _Back then I saw 50000 files in my cache folder for the profile specificly for this, which are all created by the WebUI._ _To put this in perspective. My default browser profile is used for everything else._ _And I never, repeat, never clear cache._ _And... there's only 26000 files in it._ _This also means, if I use Kobold WebUI with my main browser profile. Since older cache files will get automatically deleted due to the cache size limit, there will never be a point where the WebUI cache files bank up that many!_ _But I can't, due to NoScript magnifying the delay problem on my main browser profile. And there's nothing (?) stopping the WebUI from generating too many cache files. Those 60000 cache files only weights maybe 100MBs total (300MB size on disk, since it's mostly smaller than cluster-size files, and will waste more space on my hard drive). And 100MB is obviously way less than a default web browser cache size limit._ _Wait, just realized maybe I can try to config that profile, and set its cache size limit to only 20MBs or so. Simple, just go to about:config, and change the browser.cache.disk.capacity variable from the default 256000 (which means 256000KB) to 20000 (Which means 20000KB). Will try that later._ _Edit: Need to also change "browser.cache.disk.smart_size.enabled" to "false"! Other wise the manual cache size settings won't work at all._ _To verify after configuring, can go to "about:cache" to see if the "Maximum storage size" is indeed changed._
LostRuins commented 11 months ago

If your cache is corrupted, clearing browser storage for that site will usually fix it. There's also a "Reset all settings" button. Regarding large files, i will try to optimize it, but please understand that compromises have to be made - this is not just a normal text editor.

aleksusklim commented 11 months ago

@WingFoxie, I cannot see your problems with browser cache. For me, no "4kb files" are created during streaming.

Can you tell your exact browser vendor and version, and the link to that NoScript plugin which you are using? In this case I'll try to reproduce your issue.

WingFoxie commented 11 months ago

I'm just using the latest version of Firefox. You should be able to just reproduce it with a fresh new install, or a fresh new profile, without any plugins installed. With "token streaming" set to "on", of course.

Oh wait you are asking for a link to NoScript plugin? Just search "NoScript" in Firefox addon store. The first one in the search results, author name "Giorgio Maone".

Oh, and, since you are asking like this, you probably should create a separate profile in "about:profiles", and do the testing there!

I don't want you to install NoScript on the browser you usually use for everything, and then wonder "Why all the websites suddenly stopped working???". Since that's more or less the "supposed" behavior! To turn off all the scripts! And only turn one the ones that are absolutely necessary for you. And, of course, you should allow the WebUI's script to run in NoScript. And... I believe you should find the "editing long chat history" performance is "even worse".

Sorry that I couldn't verify this conveniently. But I believe that as long as you have that plugin installed, no matter how you configure it for your WebUI, the "Long lag when edit long text" problem will be magnified. If you couldn't reproduce this specific thing just forget it. Since I believe that "NoScript" doesn't create problems on it own, only magnifies the existing problems in the WebUI anyways.

Unnecessary details just in case It doesn't matter that Firefox did update a few times during this. It doesn't change the symptom. Tried a bit, and I believe that, for the "spam cache files" problem, the plugins don't make it better or worse. This shouldn't matter, but I configured this profile to not able to connect to the Internet.

And about the 10000s of cache files... The two problems I observed, is that the files spam a lot, and the WebUI never clear them up.

More details about what I observed. The whole bunch of "check.json" files. During each generation, it should spam a couple of files. Each storing the next step of token streamed text. Like this: ``` {"results": [{"text": "Durin}]} {"results": [{"text": "During eac}]} {"results": [{"text": "During each gen}]} {"results": [{"text": "During each generati}]} ... {"results": [{"text": "During each generation, it should spam a couple of files.}]} ``` There should be a bunch of "check.json" files generated, one for each line of them. And the last "check.json" file will has the last line: `{"results": [{"text": "During each generation, it should spam a couple of files.}]} ` Then there will be another file called "generate.json" storing the final output once again: `{"results": [{"text": "During each generation, it should spam a couple of files.}]}` Of course, you can observe the above results, only if you use a third party tool called "mzcacheview", to see the actual files through obfuscation. (Tip: To open an item listed in mzcacheview, you can select an item, and press F7 to open the file with that tool. So you don't have to extract the file first.) If you just open the raw files inside the Firefox cache folder (The cache folder of Firefox is named "cache2", btw.) and just open the raw files with your notepad, you will see the files actually stored with a bunch of other text in each one of them, probably generated by Firefox itself. Like the one I presented in earlier reply. To clarify, the 4KB files was just like a bunch of "50 Bytes" files in those "check.json" files, when Firefox stores them with additional data, it becomes a bunch of about 1KB files. And they will actually cost your hard drive 4KB of storage each, if that partition's "cluster size" is 4KB. And "4KB cluster size" should be the normal standard for SSDs.
aleksusklim commented 11 months ago

Okay, with Firefox I clearly see it creates cache entries for EACH /check request (1 per second) and stores this somewhere in C:\Users\<USER>\AppData\Local\Mozilla\Firefox\Profiles\<PROFILE>.default-release\cache2\entries\[0-9A-F]+

Here is the request that my browser makes:

POST /api/extra/generate/check HTTP/1.0
Host: localhost:5001
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:96.0) Gecko/20100101 Firefox/96.0
Accept: */*
Accept-Language: ru-RU,ru;q=0.8,en-US;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate
Referer: http://localhost:5001/
Content-Type: application/json
Origin: http://localhost:5001
Content-Length: 21
DNT: 1
Connection: keep-alive
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-origin

And here is the response headers from koboldcpp:

HTTP/1.0 200 OK
Server: ConcedoLlamaForKoboldServer
Date: Sun, 29 Oct 2023 12:01:57 GMT
access-control-allow-origin: *
access-control-allow-methods: *
access-control-allow-headers: *, Accept, Content-Type, Content-Length, Accept-Encoding, X-CSRF-Token, Client-Agent, X-Fields, Content-Type, Authorization, X-Requested-With, X-HTTP-Method-Override, apikey, genkey
content-type: application/json

I see here no Cache-Control parameter, which could change the default behavior for caching. I decided to check, will it suffice to set it to no-store ?

I installed a simple programmable proxy server in NodeJS: https://github.com/http-party/node-http-proxy

This is the code that I tried:

// npm install http-proxy
var httpProxy = require('http-proxy');

var proxy = httpProxy.createProxyServer({
  target:'http://localhost:5001',
  selfHandleResponse : true,
}).listen(5002);

proxy.on('proxyRes', function (proxyRes, req, res) {
  var body = [];
  proxyRes.on('data', function (chunk) {
      body.push(chunk);
  });
  proxyRes.on('end', function () {
      body = Buffer.concat(body);
      res.end(body);
  });
  proxyRes.headers['Cache-Control'] = 'no-store';
  res.writeHead(200,proxyRes.headers);
});

proxy.on('error', function (err, req, res) {
  res.writeHead(500,{
    'Content-Type': 'text/plain'
  });
  res.end('500');
});

Basically, it opens local port 5002 and proxyfies all requests to port 5001, putting Cache-Control: no-store to all responces. And I see Firefox indeed stopped to store those pesky little files!

@WingFoxie, I suggest you to try this, provided you know how to run NodeJS; in essence, you need to install any LTS version (just be careful to not check "Chocolately" option in installer); and then create index.js with the code I provided above; then run in console npm install http-proxy + node index.js there.

@LostRuins, I suggest you to add Cache-Control: no-store response header on all POST requests. (It is strange for me why Firefox caches that by default?)

aleksusklim commented 11 months ago

Buy the way, instead of using Firefox (and noting that you don't want to use Chrome for whatever reason), a good way to play with local files is, surprisingly, Microsoft Edge (provided you aren't using it as your regular browser).

With a program like simplewall, which is a software firewall, you can disable internet access for Microsoft Edge (or disable for all processes except your allowed list, which I highly recommend to do on a computer that you own as the sole user), and then use it as fully local application for viewing HTML files and opening localhost servers as koboldcpp.

This way it is guaranteed that no data will be ever sent to internet by the browser itself. (Also I have Microsoft Edge as my default system browser – yes, without the access to internet – so that any accidental click on a link in any other application just don't open a random webpage leaking my IP or anything; but that's already an offtopic here)

LostRuins commented 11 months ago

Fixed in v1.48, cache control no store has been added. Please try.

aleksusklim commented 11 months ago

Fixed in v1.48, cache control no store has been added. Please try.

I see Firefox indeed stopped caching results of streaming!

But I also tried new SSE mode, and it is not working in Firefox? Only in Chrome I see new type of requests and text appearance by character level at runtime. In Firefox there is no difference between On and SEE for Token Streaming

If SSE is not supposed to work in Firefox, that's fine, I just wanted to ask if that's intended (for example because FF lacking/bugging an API to do it properly without dirty hacks – which is unfortunately often the case with Firefox).

UPD:

WritableStream which was only added to firefox around version 100+.

Oh, I'm on 96, thank you!

LostRuins commented 11 months ago

Most likely your browser is too old. SSE streaming relies on WritableStream which was only added to firefox around version 100+. If that fails, it will fall back to chunky polled streams.

Kaelygon commented 11 months ago

I am using koboldcpp-cuda 1.50-1 from AUR using yay https://aur.archlinux.org/packages/koboldcpp-cuda and I can reproduce the performance issues on Firefox.

I imported a 1.1MB story from my 0cc4m/KoboldAI installation and every time I edit a paragraph, the client freezes for 25 seconds on Firefox 120.0 While I am getting 16 T/s and token streaming works fine, editing is unusable. Undo, redo and saving world info takes 1-2 seconds which is tolerable.

I never had this major performance issues with 0cc4m's client, even the "New UI" works fine with this large story. On 0cc4m's client editing a paragraph is almost instant.

On chromium 119.0.6045.159 editing is much faster 1-2 seconds. Saving world info takes about same on both firefox and chromium.

LostRuins commented 10 months ago

Hi, are you able to run it with a performance profiler and see which javascript function is the culprit? If not, send me (DM or discord if you prefer) your 1.1mb story and I will try to take a look.

Kaelygon commented 10 months ago

I am not sure what I am looking for in the profiler but here's what I found out. Functions set Element.innerHTML, merge_edit_field, onblur and EventHandlerNonNull ran for 30000ms parallel. render_gametext ran parallel for the last 9 seconds of that total 30 seconds

Firefox 2023-11-26 02.35 profile.json.gz saved_story(1).json

LostRuins commented 10 months ago

Okay, I can confirm this issue, thanks for letting me know. It will be fixed in the next version, I have already optimized the page in https://lite.koboldai.net you can try it there, it should be faster than before.

LostRuins commented 10 months ago

Hi all, this should be fixed in 1.51

WingFoxie commented 10 months ago

Tried the 1.51.1 version. Doesn't feel like it's fixed at all.

Opening a 3.3MB txt file (Which probably weights 6.6MB in the save file) containing all the older chatlogs. Pasting all the text into an empty chat log. Hangs for 83 seconds. Then clicks away to finish edit. Hangs for another... maybe 5 minutes? I lost count.

Did more testing with a 200KB save file, and didn't see much of a difference compared to before. Each editing lags for 0.8 seconds.

More testings details here: Tried with less text, a 200KB save file. With the performance capture. Edited twice and felt noticeable lag. (In the Profiler UI it measures about 0.8s lag with each edit.) Many functions executed over 100 times with each edit. Including: `"apply" "apply" "render_gametext" "merge_edit_field" "onblur"` I also noticed there are two more sets of functions executed over 100 times. I don't know why these two are "collapsed by default" in the profiler results UI. After manually clicking many times and expanding them all, I see these: `"l" "_r" "rr" "J" "j" "V" "Jr" "generate_compressed_story" "autosave" "render_gametext" "merge_edit_field" "onblur"` And these: `"rr" "J" "j" "V" "Jr" "generate_compressed_story" "autosave"` Tried again on a private browsing window, without NoScript plugin or anything. And, instead of seeing three different routes, I saw this route alone: `"onblur"(570 times) "merge_edit_field"(568) "render_gametext"(554) "autosave"(414) "generate_compressed_story"(414) "Jr"(410) "V"(406) "j"(406) "J"(406) "rr"(307) "_r"(131) "l"(129) "H"(4) "w"(2) "x"(2) "l"(2) "b"(2)` I don't know why when I try the exact same chatlog in a private browsing window with almost no plugins, the functions executed even more times! Like 300 times per edit in Private Browsing window. But only 100 times in normal browsing window, with NoScript plugins running and everything.
aleksusklim commented 10 months ago

Doesn't feel like it's fixed at all.

Is that in Firefox? Can you test the same setup but in Chrome, to be sure that it's related to the browser and not just "general performance issues" of large stories?

LostRuins commented 10 months ago

A few seconds of lag is normal for huge stories. The Firefox one used to be significantly slower than Chrome. before the fix - like 30s+, it's now closer to about 3-4s for a 1MB story.

WingFoxie commented 10 months ago

Just realized that I did all these most recent testings in Firefox with NoScript ON! I believe now the NoScript plugin's magnifying effect on the edit time is gone. Then there's definitely some improvement with v1.51.

Also... Tried Chrome. Result's a mixed bag.

Tried the 200KB save file again on a Chrome private browsing tab, and it lags for like 0.5 seconds per edit. And it doesn't increase exponentially as the log size increases. But the pasting lag is much worse! Pasting the 800KB chatlog in Chrome makes me wait for over a minute. While pasting that much in Firefox only lags for ~4 seconds.

Looks like in Firefox there's exponentially worse delay on editing. In Chorme there's exponentially worse delay pasting the chatlog in. Both got the same problem of that "onblur" function got called too many times with each edit, leading to a trail of functions also being called too many times.

Raw test results here: Tried these in Firefox (NoScript ON): |Save size|Delay when finishing an edit|"onblur" function called| |-------|-------|-------| |200KB|0.8s|500| |400KB|2.0s|1300| |800KB|5.8s|4000| |1.6MB|23.9s|16000| |3.2MB|>200s|I forgot...| And tried similar thing in Chrome (Doesn't have NoScript): |Save size|Delay when finishing an edit| |-------|-------| |200KB|0.6s| |400KB|1.4s| |800KB|1.8s| |1.6MB/3.2MB|Don't want to test anymore! Pasting that 800KB text already creates more than 1 minute of lag!| Note: Couldn't find the counter for how many times the "onblur" function is called in Chrome's UI. It only says it runs for like 600 milliseconds. And of course I am using the current latest versions of both Firefox and Chrome. Still documenting whether I have NoScript on or not just in case. Though I believe we can stop worrying about that plugin already.
LostRuins commented 10 months ago

Which box did you test pasting in chrome? The regular UI inputbox, the chat ui one, or the main edit body window?

aleksusklim commented 10 months ago

Does it matter in which mode (Story, Instruct, Chat, Adventure) you are pasting? Also, does Chrome hangs if you won't paste but open a JSON of saved story? (Asking just to clarify potential places why and if it could hang).

WingFoxie commented 10 months ago

(Biggest discovery goes in the front.) I found out that, in Chrome. If I paste a bunch of text into a chatlog (in the "gametext" box) that's already having a bunch of text in the first place, the delay is crazy long! 6 minutes or something! But if I paste into a completely empty chatlog, even with even more text, it's only going to delay for 2 seconds or so. Try pasting 800KB text into an empty log, then another 800KB. And try pasting all 1.6MB at once. See the difference.

For example, dumping the existing half of log out into Notepad, pasting what you want to paste behind it in Notepad, then pasting all the text back into the chatlog at once! That dodges the 6-minute delay. Only in Chrome by the way. Not Firefox. Firefox doesn't suffer from this problem in the first place, and it's pasting time is more like a constant 4 seconds if I try pasting either way.

(Other answers below.)

All about the "gametext" box. Modes don't seem to matter: First, the answer is: Edit. Edit. And edit! I just can't survive without intensive editing. In the context of using KoboldCPP, this means the only logical choice for me is to use **"Classic" UI style** (With "Chat" mode by the way). Since I have to keep "Allow Editing" on all the time! Non-negotiable. And the Editing mode only supports "Classic" UI style anyways. So I'm always **only talking about the editing box, the "gametext" box**. All problems I reported about the "gametext" box, doesn't exist with input boxes. They feel as fast as Windows Notepad. No problems there. And tested the loading time. It's only a few seconds with either browser. I believe it also doesn't get exponentially longer... **Test pasting in different modes??** Yeah. Sort of tested, doesn't seem to matter. Probably didn't try every single possible thing, but nothing ringed a bell so far. Suggest you just assume that different modes don't make a difference, since it's one of the first things I ever suspected from the very beginning of this issue, yet I still couldn't find any evidence until now.
Raw test results, and a definition about what "loading time" I'm testing: This time, I used a chat log which is 0.8MB when stored in a txt file. 1.6MB when saved as "saves_story.json". |Browser|Pasting in "gametext" box with the box already filled a bunch of text|Pasting in "gametext" box with the box being completely empty|Finishing edit|"Loading"(See Note3)| |-------|-------|-------|-------|-------| |Firefox|4 seconds|4 seconds|30 seconds!|4 seconds| |Chrome|310 seconds!!|2 seconds|3 seconds|4 seconds| Note1: Pasting into the "input_text" box is "as fast as pasting in Windows Notepad" in both browsers, and in both UI styles (Classic UI and Chat UI). Note2: I can't try "Aesthetic" UI style. Come on. Simply switching to that mode causes to UI to hang seemingly forever... (And it's the same input box as Messenger UI style right?) Note 3: About "Loading" time. I tested 2 ways of loading. One way is to load a ".json" file. The other way? Refreshing the page! Which will have a lag the same length with the current "loading" time. From the WebUI loaded and I see everything, until I can actually start interacting with the UI. (Yes, that's associated with the chatlog length as well.) (Yes, although I already see the most recent text in the "gametext" box as soon as the UI loaded so it looks like it's already finished "loading", I **had to wait a "loading" time after seeing that**, before I can actually do anything.
LostRuins commented 10 months ago

The problem happens due to bottlenecks when reconstructing the HTML DOM after a paste, it's just simply not optimized for generating that many newlines in such huge texts. That is a bottleneck in chrome's execCommand insertHTML and not my code.

I've added a small workaround if you try to SelectAll + Paste, it will wipe out the text first and that should result in a much faster paste. This issue is not solved though, if pasting into a window with existing text it will still be very slow depending on how much text you paste. If you paste a small amount of text, it wont be so slow.

LostRuins commented 10 months ago

The alternative is me inserting the text manually, which will be very fast. But then you would lose the ability to undo/redo that "paste". Which is better?

aleksusklim commented 10 months ago

Sorry, I haven't looked at your code, but since I'm a javascript developer myself, I can give a general advice:

  1. Do not .innerHTML += , instead prepare the full string and set it only once. (You already know that)
  2. Each .innerHTML is recreating everything that was in that element; if you need to update only a part of it – then find that element in tree (by id/class/tag or child/sibling/parent members) and set new HTML only for it.
  3. If you need to merely add a new node inside – then use insertAdjacentHTML instead of innerHTML, since it won't destroy the previous content.

By the way: often when I paste something from external applications to the history box – everything is normal until I click outside (to message box or the send button) and after that the newly pasted text often produces double / unwanted newlines at random! (As if \r\n became \n\n) Sometimes a lot, sometimes none. But this happened only after pasting external text: if I'll then click in and edit something – if won't jumble anymore on its own.

Telling this because in the moment when it happens – I can feel a lag too, as if it hangs for a moment and then my newly pasted text jumps around with extra newlines.

LostRuins commented 10 months ago

insertAdjacentHTML won't work because you are trying to replace part of the context in a content editable with new content from the clipboard. The only way that works and also allows it to be added to the browser's undo stack is execCommand. This command itself is the source of the bottleneck.

if you'd like to give it a try, the relevant code that causes the lag is at https://github.com/LostRuins/lite.koboldai.net/blob/main/index.html#L3630

This code has one purpose - to strip out all the rich text content (e.g. images, bold text, hyperlinks) when pasting from clipboard, and render everything in a consistent way.

Setting insertText does not work, because contentEditable does not have consistent innerText behavior across browsers (some add more newlines than others due to extra <div>s , we need to use hard breaks <br>)

Manually rewriting/modifying the DOM does not work - we need all changes to be correctly added to the undo stack so you can paste stuff, then CTRL+Z or rightclick+undo to revert it.

Give it a try, maybe you have an alternative approach.

aleksusklim commented 10 months ago

For a second, can you change document.execCommand("insertHTML", false, text); to

document.execCommand("insertHTML", false, '');
document.execCommand("insertHTML", false, text);

and test performance? For me it feels like pasting becomes faster, but I'm not sure. (If that works, you won't need fullySelected check anymore)

aleksusklim commented 10 months ago

Another bottleneck is autosave feature (when it is enabled), most importantly its compression, which is trigged on every "click-away" after the history was edited.

Actually, LMZA-JS supports asynchronous compression, and it is pretty much working as-is! Here are changes that I made:

    function generate_compressed_story(save_images,export_settings,export_aesthetic_settings,opt_async) {
        //encode the current story into a sharable url
        //a tiny json format which gets compressed by LZMA then b64url

        let story = generate_savefile(save_images,export_settings,export_aesthetic_settings);
        let storyjson = JSON.stringify(story);
        console.log("Exporting story: ", story, opt_async?"ASYNC":"");

        //var cstoryjson = LZString.compressToEncodedURIComponent(storyjson);
        if(opt_async){
            lz_c.compress(storyjson, 1, function(res){
                opt_async(buf_to_b64(res))
            });
        }else{
            var cstoryjson = buf_to_b64(lz_c.compress(storyjson, 1));
            return cstoryjson;
        }
    }

…

    function autosave() {
        //autosave
        try {
            update_prev_custom_endpoint_type();
            localStorage.setItem(STORAGE_PREFIX + "settings", JSON.stringify(localsettings));
            if (localsettings.persist_session) {
                generate_compressed_story(true, true, true, function(compressedstory){
                    try{
                        localStorage.setItem(STORAGE_PREFIX + "story", compressedstory);
                        console.log("autosave done");
                    } catch (e) {
                        console.log("autosave failed: " + e);
                    }
                });
            }
        } catch (e) {
            console.log("autosave failed: " + e);
        }

    }

In this setup, I see Exporting story: in log, but autosave done will be only around 10 seconds after it (for saved_story.1.json 1 megabyte file from this thread). Not sure whether async localstorage write is always desirable; maybe you can change the autosave option to "on/async/off" rather than a checkbox?

Also I looked at your

    function escapeHtml(unsafe)
    {
        return unsafe
            .replace(/&/g, "&amp;")
            .replace(/</g, "&lt;")
            .replace(/>/g, "&gt;")
            .replace(/"/g, "&quot;")
            .replace(/'/g, "&#039;");
    }
    function unescapeHtml(input)
    {
        return input
            .replace(/&amp;/g, "&")
            .replace(/&lt;/g, "<")
            .replace(/&gt;/g, ">")
            .replace(/&quot;/g, "\"")
            .replace(/&#039;/g, "\'");
    }

And changed that to faster cached regular expressions:

    var _escapeHtml_map_ = {
        '&': '&amp;',
        '<': '&lt;',
        '>': '&gt;',
        '"': '&quot;',
        "'": '&#039;',
    };
    var _unescapeHtml_map_ = {
        '&amp;' : '&',
        '&lt;'  : '<',
        '&gt;'  : '>',
        '&quot;': '"',
        '&#039;': "'",
    };
    function _escapeHtml_replace_(unsafe) {
        return _escapeHtml_map_[unsafe];
    };
    function _unescapeHtml_replace_(input) {
        return _unescapeHtml_map_[input];
    };
    var _escapeHtml_regexp_ = /['<&>"]/g;
    var _unescapeHtml_regexp_ = /&amp;|&lt;|&gt;|&quot;|&#039;/g;
    function escapeHtml(unsafe) {
        return unsafe.replace(_escapeHtml_regexp_, _escapeHtml_replace_);
    };
    function unescapeHtml(input) {
        return input.replace(_unescapeHtml_regexp_, _unescapeHtml_replace_);
    };

I couldn't see any noticeable performance boots after this change, so it's up to you whether you want to take it or not. (For me it's just feels wrong to call .replace many times, especially on potentially longer strings)

P.S. Why do you hide the Settings button when .html is opened locally offline? There are some crucial options that affect local editing also, and desirable to have even when You will still be able to load and edit stories, but not generate new text.

The alternative is me inserting the text manually, which will be very fast. But then you would lose the ability to undo/redo that "paste".

If my trick in above message (with pasting empty string before actual modified content) gives nothing, we can try something else. At first, we would indeed need "you pasting manually" (confirming that performance would be good), and then we could either somehow trick the browser to keep its undo stack (by sending fake execCommand maybe?), or re-implement the stack itself.

I've found https://stackoverflow.com/a/67678889 but the code is really messy and has to be incorporated very carefully. I hope there would be a different approach rather than this particular one… I googled a lot, and the most promising trick was to create the new paste event (or modify an existing real one) and dispatch it manually as in https://stackoverflow.com/a/47753075, but looks like this is not working anymore at least in modern Chrome.

LostRuins commented 10 months ago

For me it feels like pasting becomes faster, but I'm not sure.

It does not, it's still horribly slow when pasting large text chunks into a large existing document.

cached regular expressions:

Nah, string literal replace is actually much faster than any sort of regex. Moreover, these operations are both extremely fast compared to DOM modification and would not be part of any bottleneck.

async saving

Async saving is possible, but needs more testing - not sure if its worth the hassle. The main concerns would be

  1. Race conditions, where an older save command could potentially complete after a new one especially if the user tried to change stories while saving.
  2. Background execution, which is not guaranteed especially if the user minimizes the browser window, closes the window early, or switches to a different activity (e.g. on mobile), there is a risk of the save being disrupted before being completed.

Why do you hide the Settings button

Yeah I can leave it enabled, I guess.

clipboard hacks

They seem really clunky. I doubt they will be cross browser compatible.

aleksusklim commented 10 months ago

It does not, it's still horribly slow

Sigh, welp.

Okay, how would you "paste manually"? I presume there would be many HTML elements in the history (colorization?), which could be partially selected and thus should be cut in the middle or somehow splitter before deletion and insertion (and then merging?)

Can you actually do that correctly? (In a separate branch to test further).

They seem really clunky. I doubt they will be cross browser compatible.

Also there are different "online text editors" out there, but I've checked them many years ago; probably now they are much better, and ideally should support arbitrary pasting along with undo history. We should find out how they have implemented it! I will check and return here to tell what I'll find.

Nah, string literal replace is actually much faster than any sort of regex.

Um-m… In this particular code, you are using regular expressions too! You're not doing .replaceAll (rather modern function that exists solely to NOT make a regular expression with /g to replace all occurrences and not only the first one).

these operations are both extremely fast compared to DOM modification and would not be part of any bottleneck.

Yes, this is true. (I'm just always think from large-scale perspective, where you don't want tiny helper functions to be even theoretically slow, because later they can be used in tight loops).

I've benched two parts: the paste callback and the blur callback. In the former, the slowest call is indeed execCommand, in the latter it was compress (took more than a half of onblur execution time).

Race conditions, where an older save command could potentially complete after a new one

Oh, you're right! Okay, we can wrap a singleton, where each new async autosave call would invalidate any previous one (dropping the compressed result and not even doing base64 on it).

Also, there is a second "onprogress" callback of compress function, and we can throw from there conditionally to effectively cancel the ongoing compression!

Background execution, which is not guaranteed

I've just benched it. The onprogress callback gets executed in 200-500 ms intervals normally in foreground, but may be throttled to 1000 ms when the user switches to another tab or application.

I think this can be solved by cancelling the ongoing async compression (throwing away partial work) and running the sync version right there instead, when the continuous throttling is detected (e.g. when more than 950 ms delay was 3 times in a row, allowing individual hiccups).

Here is how I would do that:


    var _generate_compressed_story_async_ = 0;
    var allow_async_autosave = true; // SHOULD BE IN SETTINGS

    function generate_compressed_story(save_images,export_settings,export_aesthetic_settings,cb_async_autosave) {
        //encode the current story into a sharable url
        //a tiny json format which gets compressed by LZMA then b64url
        let story = generate_savefile(save_images,export_settings,export_aesthetic_settings);
        let storyjson = JSON.stringify(story);
        console.log("Exporting story: ", story);
        //var cstoryjson = LZString.compressToEncodedURIComponent(storyjson);
        if(cb_async_autosave && allow_async_autosave){
            var current_async = ++_generate_compressed_story_async_; // increment call counter
            var last_timestamp = Date.now();
            var throttle_counter = 0;
            lz_c.compress(storyjson, 1, function(res){
                if(!cb_async_autosave){ // callback was already called
                    return;
                }
                if(current_async!==_generate_compressed_story_async_){
                    cb_async_autosave(false); // this was not the latest autosave
                }else{
                    cb_async_autosave(buf_to_b64(res));
                }
            },function(progress){
                if(current_async!==_generate_compressed_story_async_){
                    var cb = cb_async_autosave; // abort older and return false
                    cb_async_autosave = null;
                    cb(false);
                    throw 'abort previous async autosave';
                }
                var new_timestamp = Date.now();
                if(new_timestamp-last_timestamp>950){ // browser throttle of inactive tab
                    if(++throttle_counter>2){
                        console.log('browser throttle detected');
                        var cb = cb_async_autosave;
                        cb_async_autosave = null;
                        cb(buf_to_b64(lz_c.compress(storyjson, 1))); // resort to sync compression
                        throw 'cancel async autosave to sync';
                    }
                }else{
                    throttle_counter = 0; // allow single hiccups
                }
                last_timestamp = new_timestamp;
            });
        }else{
            var cstoryjson = buf_to_b64(lz_c.compress(storyjson, 1));
            if(cb_async_autosave){ // when called as async but forbidden by settings
                cb_async_autosave(cstoryjson);
            }else{
                return cstoryjson;
            }
        }
    }

…

    function autosave() {
        //autosave
        try {
            update_prev_custom_endpoint_type();
            localStorage.setItem(STORAGE_PREFIX + "settings", JSON.stringify(localsettings));
            if (localsettings.persist_session) {
                generate_compressed_story(true, true, true, function(compressedstory){
                    if(compressedstory===false){
                        console.log("older async autosave cancelled");
                        return;
                    }
                    try{
                        localStorage.setItem(STORAGE_PREFIX + "story", compressedstory);
                        console.log("autosave done");
                    } catch (e) {
                        console.log("autosave failed: " + e);
                    }
                });
            }
        } catch (e) {
            console.log("autosave failed: " + e);
        }
    }

I have tested it, it works reliably now. What do you think? UI is not hanging anymore when clicking outside of the large history after editing it.

LostRuins commented 10 months ago

Seems reasonable. I've added it for the autosaves as a new function, I'm leaving the manual export/save/load untouched as synchronous. Simplified the code a bit, let me know if this has any problems.

https://github.com/LostRuins/lite.koboldai.net/commit/21fbbb98705a55a1bef6ff43c6e1136078f7544e

aleksusklim commented 10 months ago

I've added it for the autosaves as a new function

Good! A dedicated function is wise.

Simplified the code a bit, let me know if this has any problems.

Wait, wait, you are not throwing from onprogress callback? This leads to parallel redundant work that get dropped but wastes computational resources. For example, if I would click into the history to mash on keyboard and return to the input box and repeat – several autosaves are running simultaneously:

Autosave Start
Merged edit field
Autosave Start
Merged edit field
Autosave Start
Merged edit field
Autosave Start
Merged edit field
Autosave Rejected
Autosave Rejected
Autosave Rejected
Autosave Done

If we would go for aborting the previous one – neither might be done while the user keeps clicking… Okay, I have another solution, especially if you are not feeling right to throw exceptions:

    var autosave_compressed_story_async_ongoing = 0;
    function autosave_compressed_story_async(save_images,export_settings,export_aesthetic_settings) {

        if(autosave_compressed_story_async_ongoing>0){
            autosave_compressed_story_async_ongoing = 2;
            console.log("Delay Autosave");
            return;
        }
        let story = generate_savefile(save_images,export_settings,export_aesthetic_settings);
        autosave_compressed_story_async_ongoing = 1;
        let storyjson = JSON.stringify(story);
        console.log("Autosave Start: ", story);
        lz_c.compress(storyjson, 1, function(res){
            console.log("Autosave Done");
            var compressedstory = buf_to_b64(res);
            localStorage.setItem(STORAGE_PREFIX + "story", compressedstory);
            var repeat = autosave_compressed_story_async_ongoing>1;
            autosave_compressed_story_async_ongoing = 0;
            if(repeat){
                autosave_compressed_story_async(save_images,export_settings,export_aesthetic_settings);
            } 
        });
    }

Now I have a global flag. If it is 0 then no other autosaves are running in parallel, so we start the first one and set the flag to 1 But if we see that this flag is already nonzero – we rewrite it to 2 and do nothing. This means, "the previous autosave is stale".

When the compressions finishes, it checks the flag. If it is 2 (and not 1) – then somebody called autosave again during compression. So the current compressed result is outdated. But! We won't just drop it; instead, we save it into the local storage as normal – and repeat the autosave again! (Resetting the flag back to 0 in either case)

Since we don't have to repeat compression with the exact same story as was requested in the last autosave call (because we can just compose a new fresh one) – all redundant autosave calls are cheap: all they do is "please, redo the autosave after the current one if you have any".

This solves all three problems:

  1. only single compression is running in background at time;
  2. the first autosave is guaranteed to be done no matter how much were queued;
  3. the last autosave is always up to date.

The above log becomes:

Autosave Start
Merged edit field
Delay Autosave
Merged edit field
Delay Autosave
Merged edit field
Delay Autosave
Merged edit field
Autosave Done
Autosave Start
Autosave Done

Returning to onpaste hell… Well, I've checked several online editors like

– Seems like neither of them can really restore the undo stack. No, all of them do have "undo/redo" functionality (often on Ctrl+Z, Ctrl+Y or Ctrl+shift+Z) but if you right-click and choose "Undo" in the browser native context menu, they either:

There was one that could really undo: https://alex-d.github.io/Trumbowyg/ But I think they are still calling execCommand('insertText') somewhere…

Also there was this one: https://sofish.github.io/pen/ And it's interesting! Looks like they are "just pasting as it is" but then cleaning whatever they don't support:

    // listen for paste and clear style
    addListener(ctx, editor, 'paste', function() {
      setTimeout(function() {
        ctx.cleanContent();
      });
    });

…

  // remove attrs and tags
  // pen.cleanContent({cleanAttrs: ['style'], cleanTags: ['id']})
  Pen.prototype.cleanContent = function(options) {
    var editor = this.config.editor;
    if (!options) options = this.config;
    utils.forEach(options.cleanAttrs, function (attr) {
      utils.forEach(editor.querySelectorAll('[' + attr + ']'), function(item) {
        item.removeAttribute(attr);
      }, true);
    }, true);
    utils.forEach(options.cleanTags, function (tag) {
      utils.forEach(editor.querySelectorAll(tag), function(item) {
        item.parentNode.removeChild(item);
      }, true);
    }, true);
    checkPlaceholder(this);
    this.checkContentChange();
    return this;
  };

– Can you do that? Traverse internals of history box and strip anything that should not be there. Since you know the cursor position/selection, in theory you should be able to find the place "that should have been modified by the native pasting" and convert HTML to text-only nodes?

Also, I tried to apply my own approach to "clean the clipboard" like this: (can be pasted directly in browser console at runtime)

var paste_fake_textarea_elem = null;
function paste_fake_textarea(cb){
    var textarea = paste_fake_textarea_elem;
    if(!textarea){
        textarea = document.createElement("textarea");
        textarea.wrap = "off";
        textarea.spellcheck = false;
        textarea.style = "position:fixed;left:-99px;top:-99px;width:9px;height:9px";
        document.body.appendChild(textarea);
        paste_fake_textarea_elem = textarea;
    }
    textarea.style.display = "";
    var selection = window.getSelection();
    var saved = [selection.anchorNode, selection.anchorOffset, selection.focusNode, selection.focusOffset];
    textarea.focus();
    textarea.oninput = function(){
        textarea.oninput = null;
        var text = textarea.value;
        textarea.style.display = "none";
        textarea.value = "";
        var selection = window.getSelection();
        selection.setBaseAndExtent(saved[0], saved[1], saved[2], saved[3]);
        cb(text);
    };
};

document.addEventListener("paste",function(e){
    paste_fake_textarea(function(text){
        setTimeout(function(){
            console.log("insertText...");
            var time = Date.now();
            document.execCommand("insertText", false, text);
            console.log(Date.now()-time);
        },0);
    });
},true);

– This does not work as I expected. I am creating a fake off-screen textarea just to redirect the native pasting there – to grab the cleaned text-only version of clipboard contents. This gave me nothing useful (and it also interferes with onblur in your code, and prone to race conditions but let's put that aside for now).

Having the cleaned text, how do we paste it preserving the stack? You said that you cannot use insertText for whatever reasons (and had to use insertHTML), but I decided to actually try to bench it… Complete failure! Copying large parts of the history and pasting them back gives an awful delay on document.execCommand("insertText", false, text); too.

The editor that I had used years ago was https://quilljs.com/ but apparently they are intercepting Ctrl+Z key manually somewhere. In the end, maybe this is the way to go?

LostRuins commented 10 months ago

throwing from progress callback

Throwing exceptions to halt an existing save-in-progress doesn't feel right, in this case function lz_c.compress seems stateless and harmless enough but this kind of behavior can often lead to resource leaks/inconsistent state when the libraries are not expecting it (e.g. memory allocated is not freed when interrupted unexpectedly, opened file handles not closed) so I try to avoid it.

Okay, I have another solution, global flag

I'm not so sure about this one. Calling the autosave function recursively seems kind of dubious, especially since autosave_compressed_story_async_ongoing is a global variable that can have it's state altered and reset externally, something you really want to avoid with recursive method calls.

Even assuming it works without issues, the second save basically triggers automatically at an unknown time in the future (maybe they are halfway modifying some settings/editing something?), rather than with a "known good" state when triggered normally (e.g end of edit, end of generate action, closing settings window).

I think that overall, the total the number of saves has not changed compared to the synchronous approach, just that from the user's POV it now doesn't freeze when saving, which is all that's needed in the end.

intercepting Ctrl+Z key manually somewhere

Won't work on mobile, won't work with right-click + undo, won't work for people with non-default keymaps.

LostRuins commented 10 months ago

You said that you cannot use insertText for whatever reasons

One problem with insertText is some browsers handle it differently, for newlines for example some add <div> s and some add <br> and some add <div><br></div> which is a real headache when parsing it back to store, you end up with extra or fewer newlines.

aleksusklim commented 10 months ago

this kind of behavior can often lead to resource leaks/inconsistent state when the libraries are not expecting it

I see your point. I can reply that if a library breaks badly because of exceptions in callbacks – then an Issue should be filed to its developers. Or, in specific cases where is not feasible to catch it gracefully there by performance reasons – the documentation of the library should tell it clearly that exception leaking is forbidden (so the user should try-catch on his own if needed). But, yeah, I hadn't read their documentation thoroughly, so I hadn't have rights to throw like that ))

a global variable that can have it's state altered and reset externally

By using let (that is present in your code too) it is possible to closure a "function-local static variable" like this:

{
  // new scope for let/const

  let local_var;

  function global_func(new_value) {
    var old_value = local_var;
    local_var = new_value;
    return old_value;
  };

}

// the function is visible outside:
console.log(global_func(1)); // undefined
console.log(global_func('test')); // 1
console.log(global_func(true)); // "test"

// but the variable is not:
console.log(local_var); // Uncaught ReferenceError: local_var is not defined

Personally, I would rather abuse the fact that a function is a dynamic object for which it is possible to add properties like global_func._ local_var = new_value, but that still can be accessed outside on purpose (useful for debugging, for example).

In case of let the variable is fully private. The only caveat is that the function in the block won't be directly available before it's declaration (above the block it is undefined until the execution reaches it the first time), but that won't be a problem for your code.

the second save basically triggers automatically at an unknown time in the future

In the previous version that you've merged, the save is also triggers at unknown time in the future, but with known value. It still can rewrite localStorage in-between something (but I doubt your code is sensitive to that).

rather than with a "known good" state when triggered normally

All right, we can produce JSON and stringify it before post-ponding the compression. Would this suffice you?

I think that overall, the total the number of saves has not changed compared to the synchronous approach

This is wrong: previously, the user physically could not save more often than possible, because UI hangs and cannot process onblur anymore. Now each onblur after any edit triggers a new parallel compression, making the UI slower and previous compressions to finish late. The more text in history, the more serious this problem would be!

Won't work on mobile, won't work with right-click + undo, won't work for people with non-default keymaps.

Then somebody should test, how those other editors behave on mobiles and non-default keymaps. (The right-click undo is out of question, unfortunately).

Meanwhile, I'll try to come up with a function to "clean-out" any pasted contents for your history box. For now, this is my updated version of autosave, what do you think this time?

    {
        let new_storyjson = null;
        function autosave_compressed_story_async(save_images,export_settings,export_aesthetic_settings) {
            let story = generate_savefile(save_images,export_settings,export_aesthetic_settings);
            let storyjson = JSON.stringify(story);
            let ongoing = new_storyjson;
            new_storyjson = storyjson;
            if(ongoing){
                console.log("Delay Autosave: ",story);
                return;
            }
            console.log("Autosave Start: ", story);
            (function retry(json) {
                lz_c.compress(json, 1, function(res) {
                    console.log("Autosave Done");
                    let compressedstory = buf_to_b64(res);
                    localStorage.setItem(STORAGE_PREFIX + "story", compressedstory);
                    let newer = new_storyjson;
                    new_storyjson = null;
                    if (newer && newer !== json) {
                        console.log("Updating Autosave");
                        retry(newer);
                    }
                });
            })(storyjson);
        }
    };
LostRuins commented 10 months ago

Your new save approach sounds correct but it is failing and losing data for some reason in practice when I test it. There must be some race condition that causes it to fail.

Try this test:

Now try your version:

aleksusklim commented 10 months ago

Confirmed.

Look: console.log("Autosave Done: ", res.length);

After initial loading of a large story, it prints Autosave Done: 2525 And after some time, it prints Autosave Done: 319640

I presume for some reason, your code returning an "empty" story when it loads the first time!

Since my code is "saving the first one, then saving the last one", it saves the empty.

Looking at console even in yours version, I see memory: "" prompt: "" In Autosave Start:, but populated properly in subsequent logs.

I logged gametext_arr in generate_savefile and it prints [] on the first time the tab loads.

What we will do about it? I presume, the solution "just drop an empty story" is not a wise one, because this way the user wouldn't have the ability to "clear" the history completely (but maybe this is a good thing?) Also, "just drop the first one" is very incorrect either.

generate_savefile should not return bad data in the first place!

LostRuins commented 10 months ago

I would presume this happens anytime the UI starts from a new story, which happens on startup but also in multiple other places. It's not "bad" data, this doesn't cause any issues in both the synchronous call and my current async approach.

Maybe the browser is smart enough to allow existing executing functions to finish executing completely before the process is halted (e.g. user closed tab, user refresh page) however, in the new approach the browser has no way of knowing there is incomplete tasks queued when it faces unexpected termination.

LostRuins commented 10 months ago

I can try to avoid the autosave on restart story, but i feel like this is a band aid that may mask other issues with the non-immediate approach. Maybe it would be safer to keep the save/load stuff also using sync save, and leave autosave only for the runtime editing and generation

aleksusklim commented 10 months ago

It's not "bad" data, this doesn't cause any issues in both the synchronous call and my current async approach.

In synchronous version, you are just saving twice in a row, rewriting the incorrect one right away. In current asynchronous version, the second call drops the first one (and the save file is not rewritten in case of tab reload).

You should not alter the saved state on loading!

LostRuins commented 10 months ago

Okay pushed a fix for this and integrated it into lite, hopefully it doesn't cause issues. Please check.

aleksusklim commented 10 months ago

Okay pushed a fix for this and integrated it into lite

Nah-ah, you have a subtle bug there!

        (function retry_autosave(json) {
            lz_c.compress(json, 1, function(res) {
                console.log("Autosave Done");
                let compressedstory = buf_to_b64(res);
                localStorage.setItem(STORAGE_PREFIX + "story", compressedstory);
                let newer = pending_storyjson_autosave;
                pending_storyjson_autosave = null;
                if (newer && newer !== json) {
                    console.log("Updating Autosave");
                    retry_autosave(newer);
                }
            });
        })(storyjson);

– Here, you are nulling pending_storyjson_autosave and THEN you check whether you need to retry; but you are retrying it right away. Thus, pending_storyjson_autosave will be null even while the story is compressing.

Any subsequent outer call won't see that, and will ultimately start a second concurrent compression, leading to race condition! Can be confirmed in console by constantly editing the history and making sure there was an edit after the moment when the first compression finishes:

Autosave Done
Updating Autosave
Autosave Done
Autosave Done

(The last line might be stacked by the browser to (2) Autosave Done)

And also you are not logging anything in if(!async_save) branch; I suggest putting there a debug dump too, just to be sure about the cause of noticeable lag when that happens.

Here is the final fixed code:

    //runs async, complete autosave only if latest to be called
    var pending_storyjson_autosave = null;
    function autosave_compressed_story(async_save,save_images,export_settings,export_aesthetic_settings) {
        let story = generate_savefile(save_images,export_settings,export_aesthetic_settings);
        let storyjson = JSON.stringify(story);

        if(!async_save)
        {
            console.log("Delay Sync: ", story);
            var cstoryjson = buf_to_b64(lz_c.compress(storyjson, 1));
            pending_storyjson_autosave = null;
            localStorage.setItem(STORAGE_PREFIX + "story", cstoryjson);
            return;
        }

        let ongoing = pending_storyjson_autosave;
        pending_storyjson_autosave = storyjson;
        if(ongoing){
            console.log("Delay Autosave: ", story);
            return;
        }
        console.log("Autosave Start: ", story);
        (function retry_autosave(json) {
            lz_c.compress(json, 1, function(res) {
                console.log("Autosave Done");
                let compressedstory = buf_to_b64(res);
                localStorage.setItem(STORAGE_PREFIX + "story", compressedstory);
                let newer = pending_storyjson_autosave;
                if (newer && newer !== json) {
                    console.log("Updating Autosave");
                    retry_autosave(newer);
                }else{
                    pending_storyjson_autosave = null;
                }
            });
        })(storyjson);
    }

Hm-m, have you considered what would happen when SYNC autosave would be triggered during an async one, for whatever reason? Do you have sensitive places in your code?

One simple solution could be:

Wanna me wrap it up?