Closed raylu closed 1 year ago
Awww, hell yeah. I’m very happy to see this.
You can’t instance objects at all here unless you’re certain they're v4 compatible, because the object definitions for most of the resources changed between godot versions.
This is the reason I wrote ResourceLoaderCompat
; I needed to be able to load binary resources without actually instancing any of the objects and then convert them to text. I ended up using it to extract properties from resources for exporting (like textures) without having to instance them.
Since ResourceLoaderCompat
doesn't instance any of the objects, you wouldn't have access to a "real" resource with all its functions, but you'd have access to the properties. So you'd be able to get the messages that way.
So there are two options for the solution:
1) You have to manually extract the properties from translations by loading the resources with ResourceLoaderCompat
and getting them from that, then extracting the messages
property and whatever else you need, then outputting them as CSVs. For an example of how to do this, take a look at what I'm doing in texture_loader_compat.cpp
for loading v2 textures and bitmaps. You'll have to add whatever class your thing is in (I suggest making a new one, like TranslationLoaderCompat
) as a friend to ResourceLoaderCompat
because I don't currently make the internal resource properties openly available, though I should when it comes time to refactor.
OR, you can modify the "real load" in ResourceLoaderCompat
so that it loads a compatibility class that is just backported from v3 and v2, respectively. This part is probably harder than the above, and I don't actually use "real loading" in ResourceLoaderCompat for anything yet, so I don't recommend it unless you feel like taking on a challenge
If you'd like, you can PR your current changes to see what you're currently doing and give you tips.
Also, if you want to take a look at how the translation resources are structured when stored, you can use the bin to text option in the GDRE tools menu; it's good to have a reference to just look back at. I'd recommend doing that for each major version; v2, v3, and v4, so you can see what the differences are.
edit: this is unnecessary, the structure didn't change, see below.
Here is an example of a bin to text .translation
file from v3:
[gd_resource type="PHashTranslation" format=2]
[resource]
hash_table = PoolIntArray( -1, -1, -1, -1, -1, -1, -1, 0, 6, 16, 2, <...>
bucket_table = PoolIntArray( 1, 1, -558281573, 507, 50, 76, 2, 1, <...>
strings = PoolByteArray( 254, 80, 33, 3, 3, 71, 117, 6, 22, 36, 18, <...>
Taking a look at the history of PHashTranslation
, it doesn't actually have the messages
property, it's just an optimized hash table. However, we got lucky here in that there aren't any actual changes to the underlying structure from v2 to v4, it was just pointlessly renamed to OptimizedTranslation
. So all you would have to do is create an object pointer that is instantiated with the type OptimizedTranslation
, set it with the properties extracted from ResourceLoaderCompat
, then reference it as an actual OptimizedTranslation
.
Example:
Object *obj = ClassDB::instantiate(type);
if (!obj) {
return ERR_PARSE_ERROR;
}
// set properties
//Properties in optimizedtranslation:
// Vector<int> hash_table;
// Vector<int> bucket_table;
// Vector<uint8_t> strings;
obj->set("hash_table", hash_table);
<etc..>
Ref<OptimizedTranslation> ref = Ref<OptimizedTranslation>(Object::cast_to<OptimizedTranslation>(obj));
Then get the messages that way.
However, looking at the function implementations here, there doesn't seem to be a way to dump all the messages at once, and it's not a real HashMap
so you can't dump the keys and values that way. You may have to create a child class of OptimizedTranslation
and cast the OptimizedTranslation
object, and write custom functions to get the individual elements.
But in either case, I'd try get_message_list
and see what happens; it may be empty since it's not actually implemented in OptimizedTranslation
and the parent function Translation::get_message_list()
references the translation_map
, which doesn't seem to be set in OptimizedTranslation.
calling
ClassDB::add_compatibility_class("PHashTranslation", "OptimizedTranslation");
ahead of time causes a segfault when I try to load it, so they don't seem to be compatible
btw, I tried to reproduce this using your examples, but I couldn't do so. I think you may have added this to the inner loop and added it multiple times, causing it to overflow and cause a seg fault. Try adding it outside of it.
If that works, then this becomes a lot easier. You can do a real load using ResourceFormatLoaderCompat
(which is recommended because ResourceFormatLoader
can pollute the path cache):
Error ImportExporter::export_translation(const String &output_dir, Ref<ImportInfo> &iinfo) {
Error err;
ResourceFormatLoaderCompat rlc;
// translation files are usually imported from one CSV and converted to multiple "<LOCALE>.translation" files
for (String path : iinfo->dest_files) {
Ref<Translation> tr = rlc.load(path, "", &err);
ERR_FAIL_COND_V_MSG(err != OK, err, "Could not load translation file " + iinfo->get_path());
ERR_FAIL_COND_V_MSG(!tr.is_valid(), err, "Translation file " + iinfo->get_path() + " was not valid");
List<StringName> messages;
tr->get_message_list(&messages);
for (const StringName &s : messages) {
print_line(s, tr->get_message(s));
}
}
return OK;
}
BTW, I did test get_message_list
and it does not work, unfortunately. the unit test even checks to make sure it doesn't work. So, you will have create a child class of OptimizedTranslations and figure out how to get the individual elements out of the hash map; take a look at struct Bucket in optimized_translations.h
thanks for looking into this and explaining everything
I wasn't using ResourceFormatLoaderCompat
, just regular ol' ResourceLoader::load
I have bad news though: the developer gave me the imported translation CSV, so this went from the top of my priority list to the bottom...
😭
I decided to implement it anyway. Give the standalone
build artifacts from the CI run a try once they're finished building. https://github.com/bruvzg/gdsdecomp/actions/runs/3317312034
wow, nice!
when I click to download "GDRE_tools-standalone-linux" on that page, the little blue progress bar at the top just slowly crawls but it never loads. when I curl
it, it says HTTP request sent, awaiting response... 404 Not Found
shame that we're not always able to recover the keys :(
wow, nice! when I click to download "GDRE_tools-standalone-linux" on that page, the little blue progress bar at the top just slowly crawls but it never loads. when I
curl
it, it saysHTTP request sent, awaiting response... 404 Not Found
You have to be logged into download it; try opening it up in a new tab.
shame that we're not always able to recover the keys :(
Yeah, and there’s no real way to do it programmatically either. You can’t recover them from the hash values, and because the key can be literally anything and stored as any member value, there’s no way to search the project for it.
The best we could do is a Translation editor, where people could edit in new translations and we then store them as a new OptimizedTranslation with the hash values from other translations. That’s a lot of work though, which is why I just tell people in the warning message to ask the creator.
just tried the build and the .assets/translations.csv
output is correct! it says they're missing keys but the game uses one of the languages as the keys and it either found that or that's the default translation or something (I didn't entirely understand the default_messages
guessing code). if there's ever a discrepancy between the sheet I have and the game assets, this will help
How that works is: We search for the locale/fallback
setting in the pck's project.godot
to determine what the default language is. If it's not set, then it defaults to English. Then we retrieve the message values for each translation, and if one of them is the default fallback language, we store the message values for that language as default_messages
. This is because it is likely that the message value for the default language will be the key or part of the key.
We then cycle through all the message values in the default translation, and try get_message(key)
to determine the key by matching the message value with the message retrieved from get_message
. The keys that we try are the message value itself, and several permutations thereof (appending $$, TL_, stripping punctuation, etc.) For example, the key for the message displayed in a "Password" box may be "$$Password". If one of them results in us getting a message value that matches what we have, we use that. If we can't find it, we store it as <MISSING KEY [message]>
It sounds like the locale/fallback
language may be set to something other than what they actually intended to be the default language. What language is the game in by default when you open it? I might want to look at the project to see if I can improve that.
the game I'm datamining has frequent updates and happens to ship the translation keys as a random language (yi_US
) to help translators see where the keys are rendered in-game. so it's actually very helpful to extract just the strings (which happen to have the keys for this game)!
regarding your comment on the PR,
When you import a translation CSV, it gets stored as OptimizedTranslation files that only store the hashes of the keys, rather than the keys themselves. It's not possible to recover the keys from the hashes, and we can't programatically get them from project resources since they can be any string value and stored in anything.
are you saying that the original strings are in the project resources and we just don't know which one it is? what if we just hash every string and look for matches?
I had thought about that, but for any project with a non-trivial amount of scripts and resources, that would be an huge amount of strings and would be insanely slow. That might be justified if the object is to recover the translation.csv, so it could be an optional thing, but there's a lot of modifications I would have to make to script/resource loading and parsing to make that happen. I'd have to load and parse every single resource and script and capture every string.
I started adding support for converting
.translation
files to something plaintext, but I ran into a v3-v4 issuethis gives me
based on https://github.com/godotengine/godot/blob/72b845b28773dd40adf6f55b226fb732910cbf14/editor/project_converter_3_to_4.cpp#L1493,
PHashTranslation
seems to be the name ofOptimizedTranslation
in v3calling
ClassDB::add_compatibility_class("PHashTranslation", "OptimizedTranslation");
ahead of time causes a segfault when I try to load it, so they don't seem to be compatibleI'm not really sure where to go from here. gdsdecomp doesn't build against v3 and I'm not sure how it's able to load other v3 assets