bruvzg / gdsdecomp

Godot reverse engineering tools
MIT License
1.36k stars 137 forks source link

Translation editor #90

Open nikitalita opened 1 year ago

nikitalita commented 1 year ago

I attempted to implement recovery of the translation.csv from translation files in #89 , but I have found that it's not really possible to recover the keys. The keys are not stored in the .translation files, so we have to guess what they are. The keys don't necessarily bear any resemblance to the fallback message, so this is highly dependent on what type of translation plugin that the creator originally used (or what strategy they used if they did it ad-hoc). In most cases, we fail to recover at least some of the keys, and often we end up recovering none of the keys.

Therefore, if someone wants to be able to add a translation to the game with that CSV that has more than a trivial amount of text, they need to have either the initial CSV by asking the creator, or manually go through and find all the missing keys.

The first option is often not possible, and the second option can be extremely labor intensive. So, my thought is that we could create a Translation editor will allow people to add new translations and save it as a new OptimizedTranslation resource with the same hash values as the other existing OptimizedTranslation tables.

I'm thinking of implementing this as an editor plugin that can be added to projects after recovery; this would ease the effort of debugging translations and determining what text goes where.

I'm making this issue primarily to gauge demand. Let me know if you think my implementation idea is a good one and if you would make use of it.

maxyip commented 1 year ago

Would love to have this translation editor

Currently there is no way to get the key included in .translation?

nikitalita commented 1 year ago

Not easily, anyway. The keys are not stored in the Optimized Translation files, only their hashes, and the key can be literally any string and stored in any resource. So we would have to parse all the resources in the entire project and test every String Variant we parse. This might be doable, but I'm going to have to make the a lot of changes to resource loading in order to make that happen, and it would be insanely slow.

DearFox commented 3 months ago

Not easily, anyway. The keys are not stored in the Optimized Translation files, only their hashes, and the key can be literally any string and stored in any resource. So we would have to parse all the resources in the entire project and test every String Variant we parse. This might be doable, but I'm going to have to make the a lot of changes to resource loading in order to make that happen, and it would be insanely slow.

How can I get the necessary hashes behind the translation strings? And how can I match the hashes to the translation strings themselves?

What I was able to get to: Used Convert binary resources to text to get the following: изображение Then I used Godot to turn strings = PackedByteArray into hex. изображение Полученные hex поместил в новый файл используя HxD и получил строки текста вперемешку с еще какими-то данными. изображение

If anyone has thoughts and ideas, even on the difficult creation of fan translations, please share them. I'll experiment some more, maybe I can achieve something.

bruvzg commented 3 months ago

How can I get the necessary hashes behind the translation strings? And how can I match the hashes to the translation strings themselves?

Check OptimizedTranslation implementation in the engine code.

Hash function:

https://github.com/godotengine/godot/blob/a7b860250f305f6cbaf61c30f232ff3bbdfdda0b/core/string/optimized_translation.h#L63-L73

Getting translated string:

https://github.com/godotengine/godot/blob/a7b860250f305f6cbaf61c30f232ff3bbdfdda0b/core/string/optimized_translation.cpp#L213-L268

nikitalita commented 3 months ago

How can I get the necessary hashes behind the translation strings? And how can I match the hashes to the translation strings themselves?

We have a class that does this already, https://github.com/bruvzg/gdsdecomp/blob/master/compat/optimized_translation_extractor.cpp

Remember, literally ANY string (or, in certain schemes, any PART of a string) in any resource or script can be used as a translation key; so you'd have to come up with a way to gather all of the strings from all of the resources and scripts and check to see if they match any of the hashes.

DearFox commented 3 months ago

How can I get the necessary hashes behind the translation strings? And how can I match the hashes to the translation strings themselves?

We have a class that does this already, https://github.com/bruvzg/gdsdecomp/blob/master/compat/optimized_translation_extractor.cpp

Remember, literally ANY string (or, in certain schemes, any PART of a string) in any resource or script can be used as a translation key; so you'd have to come up with a way to gather all of the strings from all of the resources and scripts and check to see if they match any of the hashes.

How can I provide him with a list of these strings? I see from the commit that something has been added to export the translation, but I don’t really understand how to use it. It seems even in the command line arguments there is nothing about translations

nikitalita commented 3 months ago

if you want a list of the translated messages (NOT the keys), you can just run it through project recovery and give him the resulting CSV; if you need the keys, you need to do the process I described above.

DearFox commented 3 months ago

if you want a list of the translated messages (NOT the keys), you can just run it through project recovery and give him the resulting CSV; if you need the keys, you need to do the process I described above.

Sorry, I meant how can I run this code and pass it a list of translation keys? (so that he (as I understand) would compare them with the hashes existing in the translation files) Or should I run the provided script myself?

And as for getting translation strings without keys (with key hashes), I didn’t really understand either. I don't have CSV files, I only get .translation translation files.

nikitalita commented 3 months ago

if you want a list of the translated messages (NOT the keys), you can just run it through project recovery and give him the resulting CSV; if you need the keys, you need to do the process I described above.

Sorry, I meant how can I run this code and pass it a list of translation keys? (so that he (as I understand) would compare them with the hashes existing in the translation files) Or should I run the provided script myself?

Right now, it's not exported to gdscript, so you'd need to modify and compile gdsdecomp to do this. Take a look at the export translation function here: https://github.com/bruvzg/gdsdecomp/blob/d180d30b97571274f4c1770cddf867e02cfe7516/utility/import_exporter.cpp#L742C23-L742C41

And as for getting translation strings without keys (with key hashes), I didn’t really understand either. I don't have CSV files, I only get .translation translation files.

look in the .assets folder in the exported project folder. We don't save it to the original path so that the .translation files don't get overwritten when opening the project in the editor.

DearFox commented 3 months ago

And as for getting translation strings without keys (with key hashes), I didn’t really understand either. I don't have CSV files, I only get .translation translation files.

look in the .assets folder in the exported project folder. We don't save it to the original path so that the .translation files don't get overwritten when opening the project in the editor.

I don't seem to have a .assets folder. And I couldn't find a single file in the folder with the exported project when searching for "CSV"

nikitalita commented 3 months ago

can you post your recovery log so that I can see if one was exported?

DearFox commented 3 months ago

can you post your recovery log so that I can see if one was exported?

gdre_export.log

I'm not entirely sure, but I think this is the only log file I see?

nikitalita commented 3 months ago

that's the log, yes.

I don't see a csv being exported; which game is this from? I'll decompile it myself to figure out why it's not doing it.

DearFox commented 3 months ago

that's the log, yes.

I don't see a csv being exported; which game is this from? I'll decompile it myself to figure out why it's not doing it.

I tried this on a demo of this game: https://store.steampowered.com/app/2376750/Ancient_Mind/

DearFox commented 3 months ago

It seems I was able to get a list of all the text that is in the translation file via godot: изображение But yes, get_message_list and get_message_count will not work. But I think I understand how I can implement a key search using gdscript

DearFox commented 3 months ago

The good news is that it is possible to translate text, although it is not very easy. изображение The bad news is that you will actually most likely have to open and run the entire game in the Godot editor + fix any possible errors. So far, I don’t know how to distribute and apply such a translation to games “legally” without distributing the entire PCK file. If you have any ideas on how to create some sort of patch for the PCK or exe file, let me know. So far my thoughts are that this can be done using https://github.com/DmitriySalnikov/GodotPCKExplorer, but I have not checked yet.

DearFox commented 3 months ago

A little off topic, but I think this will be useful to all translators: I used FontForge to edit fonts (font downloaded, with a free license. I had problems with the height of my font, alas, I still didn’t understand how to simply change the height of the entire font, so I duplicated the font from the game and replaced all the characters with characters from the downloaded one font, and used this font as fallbacks)

LibreOffice - for editing csv tables.

I also wrote a small script to help find translation keys. Perhaps I will get around to writing a small ready-made solution in gdscript. At the moment, here's how you can look up the keys and the translation for them: Use Notepad++ to search your entire src game folder. You need to search all strings using regular expressions. To create a regular expression, you need to understand what the translation key pattern is in your game. In my case it's something like: menu_play For this I used: (?<!(?:signal|method|porter)) ?= ?"[\w\d]+_[\w\d_]+" I myself don’t fully understand how exactly regular expressions work, but this site may help you a little: https://regex101.com/ Find all potential translation keys from your game files, turn them into an array and check each option using TRANSLATIONS_EN.get_message(i) - where TRANSLATIONS_EN - const TRANSLATIONS_EN = preload("res://***.translation" ) and i is a potential translation key.