Zxilly / go-size-analyzer

A tool for analyzing the size of compiled Go binaries, offering cross-platform support, detailed breakdowns, and multiple output formats.
https://gsa.zxilly.dev/
GNU Affero General Public License v3.0
1.21k stars 25 forks source link

Analyse .rodata/go:embed content #44

Open gudvinr opened 5 months ago

gudvinr commented 5 months ago

go:embed stores its data in .rodata section of the binary file.

I am not sure if it's possible to extract all of the content of .rodata but it would be useful to at least have some idea about embedded content.

As for example, lingua-go stores tremendous amounts of embeds, so .rodata will take up ~100Mb of the file.

Information on the exact data structure is rather sparse but it's somewhat simple because we know what does the embedding (it is https://pkg.go.dev/embed)

See also:

Zxilly commented 5 months ago

This is certainly possible, and in fact existing methods based on decompilation already recognise some of them. But I discarded the results that could not be recognised as strings when I processed the results obtained from decompilation. Because false positives can be very disturbing. There is an additional difficulty, gsa currently supports three platforms, pe/macho/elf, and writing a parser for each platform or even each go version might be too much work. I would expect a parser based on dwarf to handle this, after all, at runtime the embedded content is just a string of bytes. golang uses dwarf on all platforms including pe, so the workload is relatively acceptable.

Zxilly commented 4 months ago

The dwarf doesn't contain information about embed, maybe we still need some reverse engineering work. Therefore, it may be a long wait to implement this feature.

Zxilly commented 4 months ago

Please try v1.3.0. It has an initial support for embed parse. You must compile with the debug symbol to enable this feature. image

Zxilly commented 4 months ago

Keeping this issue open for now, as the new implementation is based on inversion and some assumptions, and it's not certain that the code will handle all real-world situations correctly. Expect feedback to fix it further.

gudvinr commented 4 months ago

I see that my test binary now reduced unknown .rodata size from ~9MB to ~3MB. Good work, thanks.