Artikash / Textractor

Extracts text from video games and visual novels. Highly extensible.
GNU General Public License v3.0
2.1k stars 203 forks source link

[Request] Add furigana / romaji support over original text #238

Open Z-Dante opened 5 years ago

Z-Dante commented 5 years ago

Will it be possible to add furigana / romaji support for the original Japanese text in textractor window? It support should be a huge help to anyone new to learning japanese (like me) and trying to not rely on machine translation too much.
Translation aggregator already does this using mecab / JParser but I just prefer the textractor window a more because of how customizable it is.

Artikash commented 5 years ago

You can use ChiiTrans Lite using the clipboard to send text from Textractor.

infidel- commented 4 years ago

Maybe you can pull that info from Google Translate? It shows the pronounciation below the Japanese text.

TamaBaka commented 4 years ago

Maybe you can pull that info from Google Translate? It shows the pronounciation below the Japanese text.

That's not how it works. In order to add this feature, you'd need a separate data store to pull from. You'd either have to fill in a database yourself or write the algorithm to connect it to a translation api and then regex the results. Then you need to edit the current extension to pull from that data store when new text comes in. That's the easy part. The hard part is then spending several hours making the furigana appear properly next to the text using Windows esoteric graphics library before you realize that the extension now looks weird in every other language.

It's not within the scope of this project. If someone else wanted to spend time making the extension though, it's highly encouraged.

...or you can skip reinventing the wheel and hook ChiiTrans Lite to the clipboard.

infidel- commented 4 years ago

I'm sorry I wasn't clear. I'm only interested in the romaji part. Google Translate shows the pronounciation below the strings, like "東子" shows "Tōko" below it. I was thinking that the http request returns this and you can print it above the translation. But I've managed to get TA+mecab working so it's not that important for me.

TamaBaka commented 4 years ago

I'm sorry I wasn't clear. I'm only interested in the romaji part. Google Translate shows the pronounciation below the strings, like "東子" shows "Tōko" below it. I was thinking that the http request returns this and you can print it above the translation. But I've managed to get TA+mecab working so it's not that important for me.

That's what I was alluding to. Your suggestion is the "easy" part (and it is still a pain to get working properly because Google Translate likes to block you if you use it too often). Once you have the data feed, then you have to get Windows to cooperate with you. Romaji don't just "appear" above the main character. You have to manually draw them there. Scaling issues aside, you have to make sure the main characters are offset downwards enough so that there is space to draw the miniaturized text. Not only that, you have to make sure that offset is included for lines further down or the text will bleed into each other. Never mind that Romaji going 5+ characters long can be wider than the character that it's topping, what do you do in that case?

It's not impossible, but the feature that you're asking for does not come out of the box. Adapting the text rendering engine to support this feature is definitely going to need far more than 5 lines of code. This is basically a side project to be pursued by a hobbyist or someone that absolutely must have the feature because of how few people actually need it and how much time and headaches are involved to get a stable product that people will accept.

infidel- commented 4 years ago

I think you're talking about aligning romaji to kanji which I also do not care about. But nevermind that, I'm curious about the blocking part - do all hooks that Textractor finds, send requests automatically? That would probably be a big overhead in terms of the number of requests - when hooking to anex86, for example, it finds at least a couple dozen of them.

Artikash commented 4 years ago

Oh, if that's all you want then it's just a one line regex change like this: https://github.com/Artikash/Textractor/commit/ceb0dd68dbf7d0d82a516d06b125afc2beccb162 That said, I don't feel like supporting and keeping updated a niche feature like this, so you're on your own from there. Google Translate Romaji.zip

Artikash commented 4 years ago

Textractor does send requests from all text threads, but there is a rate limiter. The rate limiter is only bypassed by the currently selected thread.

infidel- commented 4 years ago

Doesn't look like it works, still only two lines appear in the text window, japanese and english one.

Correction: It worked on a single line somewhat correctly (needs a new line): 【Kosaku】今なら、それがどんなにつまらない本でも俺は最後まで読み続けられる自信があるぞ. [Kosaku] Now, I'm confident that I can continue reading even the most boring books to the end. [Kosaku] imanara, sore ga don'nani tsumaranai hon demo ore wa saigomade yomi tsudzuke rareru jishin ga aru zo.

But all the rest do not show any romaji.

=== Also can you tell me how do saved hooks correlate with found hooks?

For example I have this in saved: D:\1\Games\anex86e1\anex86.exe , |4338198:0:HBX0@40FDF:anex86.exe

But I can't input any of that into Add hook window. How do I load a saved hook?

Artikash commented 4 years ago

Yeah not sure, it works fine for me. Unless you're willing to build and debug the source guess you should stick with TA.

The saved hook in your file is HBX0@40FDF:anex86.exe. Hook with an X in it means it's a custom hook intrinsic to Textractor, there is no AGTH hook code equivalent (except inline machine code, 🤢 ) so you can't reinsert it. Textractor should automatically insert any of those by itself when you attach to the game though, shouldn't be an issue.

infidel- commented 4 years ago

Okay, thanks for your help. Do you know if Neko Project 2 is hookable in this way? I can only seem to get the search working correctly with Anex86.

mspykerez commented 3 years ago

It would be the bomb if Textractor (& extra window) would breakdown word-by-word using different colors and show Furigana/Romaji like old Translation Aggregator did with JParser & MeCab.

infidel- commented 3 years ago

You can just run TA and do that already, Textractor copies the source text to the copy buffer.

mspykerez commented 2 years ago

I suppose so. Textractor itself already has a Dictionary feature but it's not exactly on par to what TA is capable of doing.

aetrna300bpm commented 6 months ago

Correction: It worked on a single line somewhat correctly (needs a new line): 【Kosaku】今なら、それがどんなにつまらない本でも俺は最後まで読み続けられる自信があるぞ. [Kosaku] Now, I'm confident that I can continue reading even the most boring books to the end. [Kosaku] imanara, sore ga don'nani tsumaranai hon demo ore wa saigomade yomi tsudzuke rareru jishin ga aru zo.

How can I achieve this?

Oh, if that's all you want then it's just a one line regex change like this: ceb0dd6 That said, I don't feel like supporting and keeping updated a niche feature like this, so you're on your own from there. Google Translate Romaji.zip

What do I do with this instruction?