bruvzg / gdsdecomp

Godot reverse engineering tools
MIT License
1.42k stars 138 forks source link

Document a high level explanation of how "decompilation" works. #72

Open Koalamana9 opened 1 year ago

Koalamana9 commented 1 year ago

Since Godot documentation says absolutely nothing about how the pck file is created/packed and how exacly scripts are "compiled", a huge number of developers will be very grateful if at least somewhere the whole process is clearly explained, and because this project did a great job of reverse engineering everything I can't think of a better place to ask for this.

nikitalita commented 1 year ago

I do want to write an explanation of how the PCK and resources are created at some point. The script “compilation” is pretty simple; all the tokens (excepting comments, but including white space) are just converted into binary numbers and just concatenated together, which is why it’s fairly trivial to recover the original scripts. The hard part is just knowing which version of the byte code it’s using.

ther’s a description of the contents of the PCK here as well: https://github.com/Bioruebe/godotdec

Koalamana9 commented 1 year ago

@nikitalita Looking foward to that explanation! How exactly gdscript tokens converted into binary numbers? I've looked at /modules/gdscript/gdscript_tokenizer.cpp and saw array of token_names, did they just saved by their index number in array? And what about numbers and characters?

I also wonder why tscn scene files are packed as plain text in pck, I know there is an option to convert_text_resources_to_binary_on_export but it's off by default in 3.x, is there is any problem with that? Is tscn tokenized the same way as gdscripts? I know that Unity uses YAML for scene files and then somehow converts that to binary as well, but I'm wondering how Godot is doing that.

There is also no info about any internal Godot formats from .import folder such as stex, scn. And why all fbx/obj/glb mesh is converted to scn in .import? I thought scn was a binary scene format converted from tscn only. In 4.x it's .ctex and .mesh, if there is any info about any of that please share!

nikitalita commented 1 year ago

How exactly gdscript tokens converted into binary numbers? I've looked at /modules/gdscript/gdscript_tokenizer.cpp and saw array of token_names, did they just saved by their index number in array? And what about numbers and characters?

Yup, they're stored as just index numbers. For values and strings, they're just stored inline; Variant type number, and then value/string. For details, check out https://github.com/bruvzg/gdsdecomp/blob/master/compat/variant_decoder_compat.cpp .

GDScript is entirely interpreted, there's no real "compiliation" going on, just a conversion to bytecode that is interpreted the same way as text.

I also wonder why tscn scene files are packed as plain text in pck, I know there is an option to convert_text_resources_to_binary_on_export but it's off by default in 3.x, is there is any problem with that? Is tscn tokenized the same way as gdscripts? I know that Unity uses YAML for scene files and then somehow converts that to binary as well, but I'm wondering how Godot is doing that.

Conversion of text scenes to binary was broken a while back and they couldn't be bothered to fix it. I submitted a PR to fix it a few months ago.

There is also no info about any internal Godot formats from .import folder such as stex, scn. And why all fbx/obj/glb mesh is converted to scn in .import? I thought scn was a binary scene format converted from tscn only.

All imported formats are converted to Godot resources upon import; FBX/OBJ/GLB formats describe 3d scenes, so they are converted into Godot scenes upon import. It doesn't really matter if they are stored in binary or text, but binary takes up far less space, and it's likely that imported models/scenes will be rather hefty, is my guess.

In 4.x it's .ctex and .mesh, if there is any info about any of that please share!

They just changed the way the format is stored. I didn't pay attention much to the changes for textures, I just backported the texture loading from v3, but you can compare the way that we load images for v3 textures vs. v4 here: https://github.com/bruvzg/gdsdecomp/blob/master/compat/texture_loader_compat.cpp#L83

overremind commented 1 month ago

I want to translate a game, and after extracting the .pck file, I found .import files inside that are in code format. How can I Decompiler these .import files into text?