Artikash / Textractor

Extracts text from video games and visual novels. Highly extensible.
GNU General Public License v3.0
2.14k stars 205 forks source link

Performance with many threads #129

Closed Kiru-Ufo closed 5 years ago

Kiru-Ufo commented 5 years ago

Performance seems to be something worked at with this program, judging from the changelogs. However, there's one big issue the program is still bad with: Spam.

As an example I have the free game Mahou Shoujo. (the remake: 新説魔法少女 ) This game SPAMS a lot of threads during normal gameplay and leads to high CPU usage. Just being on a map, without even attacking for special animations, Textractor 3.0.1 needs 10% of my CPU, permanently, while needing more and more RAM as well, because it saves everything the game spams, in every thread. After a few minutes, it easily needs 250mb already. This can't end well. Coincidentally, version 1.3 performs much better, with only 1% to 2% of CPU usage, and a smaller increase-rate of the RAM. Most likely because it finds less useless threads.

The solution is, to either try and figure out how ITHVNR handles this (as it uses 0% ram, without RAM increase in the same situation), or simply ignore not selected or linked to threads completely, as soon as at least one is selected. After hooking for the first time, without a thread selected yet, you obviously need to fill them all. The former would probably be better, because sometimes it can be interesting to see what's in other threads, as specific parts may not be hooked by the thread that hooks most things. (common example: Kiri Kiri engine where different text-sizes lead to different threads)

ITHVNR isn't bugfree, as it performs badly when you select a thread that's getting spammed, and easily crashes when you try to do this after too much spam. (probably related to memory allocation and it just being too much text) But as long as you never change the thread you need, the other ones can be spammed as much as they want, ITHVNR doesn't care as far as performance goes. Something like this would be the ideal goal. Giving an option to ignore threads other than the selected and linked to one, would be a start that shouldn't be too hard to do, I'd hope.

Nostaljaded commented 5 years ago

As per above post, tracking all threads is probably causing performance slowdowns.

Copypasted from a forum thread for Astronauts' ギルドマスター game:

Time in auto mode from 1st line in new game till Neil's first voiced line 'うむ'.

Game only: ~41s

ITH2.3 [with 'Auto suppress repetition', 'Auto copy to clipboard'] by CheatEngine method: ~41s

Textractor 3.0.0 [with only 'Remove Repetition', 'Copy to Clipboard' extension] by CheatEngine method On 'Console' output, game attached but hook not yet added: ~47s

On 'Game Hook' output: ~47s

All tests on fresh boot with game version 1.05.

Copypasted hooking method:

Temporary solution via Cheat Engine, to get single-use H-code for Textractor and old ITH. /HW-4@address of System:Char:IsHighSurrogate function as seen in Cheat Engine with activated mono features.

Start the game, open its process in CE, then in newly appeared Mono menu enable Activate mono features setting. Click Memory view button, press Ctrl+G to bring up Goto Address window, paste "System:Char:IsHighSurrogate" without quotes there, press OK or Enter. The address you jump to is our target address for the current run of the game, press Ctrl+G again and it will show up already highlighted in Goto Address window...

type /HW-4@ as the first part of H-code and paste the address from CE after that... and expires when you exit the game.

Note that only Textractor and old versions of ITH (like ITH 2.3) support this H-code...

If performance slowdowns are due to too many tracked threads, could look at having a checkbox beside each thread. Double-clicking a particular checkbox will untick all other threads. Same request as issue #127.

Could reference Chiitrans Lite's UI for ignoring threads.

Kiru-Ufo commented 5 years ago

Same request as issue #127.

127 is also about the problem of too many threads appearing though. Or at least I thought that was the issue. There needs to be some kind of hard limit of threads you can't pass, as some games really create more and more and more.

Your idea would be okay for "fixing" both, but the ITH and ITHVNR ways are better, as you don't need to do that, yet can still check other threads if you want to. I think it somehow recreates a thread, should you ever look at it... but I have no idea how. Mahou Shoujo spams, the RAM of ITHVNR isn't increasing, 0% CPU usage, yet you can choose a spammed thread and then see all the spam that happened already if you want to. The only reason the program works is, because the spam is kinda ignored, as a heavily spammed thread can crash it, should you ever select it. (in part, because it tries to recreate everything..)

Artikash commented 5 years ago

I didn't have any issues with CPU/RAM usage when playing 新説魔法少女 (fantastic game btw). What OS are you running, what extensions are you using, and what threads (please type the entire thread, should look like 4:4468:76045F3A:730023AC:00000000: SysAllocString (HQ4@0:oleaut32.dll:SysAllocString)) are being spammed the hardest? And does ITHVNR have and spam the same thread?

Kiru-Ufo commented 5 years ago

Win 7 64bit professional. Up to date. Textractor is version 3.0.1 though all do need more resources than they really should, thanks to the spam I think. Extensions: Copy to Clipboard, remove repetition. Repetition Filter under settings is turned off.

When I load the game on the map, and then hook it with Textrator and refocus it (unfocused it pauses), Textractor quickly maxes out CPU, until the CPU ups its frequency, at which point it settles down at 30-40% use I suppose? i7-2600 CPU btw., so a little older. Shouldn't suffer so much though.

Just from idling on the map, Textractor gets 10 threads. Most of these are filled with too much stuff, but the heaviest one is: 5:1FB8:778416D9:7796EB00:0: WideCharToMultiByte (HQC@0:kernel32.dll:WideCharToMultiByte) (would be nice if you could copy that from the program) I had it run for a few seconds (maybe 20) and it spammed 25 MILLION characters. The rest isn't worth mentioning compared to this.

Artikash commented 5 years ago

So the RAM usage should be fixed in 4.0.0. CPU usage should also be better, but you might have to mess with the repetition filters*. If it's still performing poorly, the 'Remove hook(s)' feature in the next release should alleviate that.

*Basically, if the repetition filter is able to successfully detect repetition in the text, it'll usually boost performance to have it on. If it can't, then it just wastes CPU and should be turned off. This goes for both 'Filter Repetition' in settings as well as the Remove Repetition extension.