Reduce memory usage - Githubissues

gdesmar commented 11 months ago

Hi, I tried to reduce the memory footprint of the library, without sacrificing speed. I had other improvements that were completed but reverted back as they were causing the library to take longer to execute, mostly related to the trim_junk function. I may revisit them later, but I wanted to get those in. My first goal was to remove all instances of pe.write() and the pe_data that is duplicated at the start of the process_pe function. In place of the pe_data, I keep a list of offset tuple (from, to) that we wish to delete, and then, assuming that we wish to create the resulting file, only get the bytes wanted from the original data.

One thing that may cause problem, is the addition of another parameter to the process_pe function. If whoever calls process_pe can give the length of the file, that would save a whole pe.write() (and therefore another full load in memory). I made it optional and at the end of the arguments, to be backward-compatible.

I do not have samples to test all code path. I don't mind running all samples with 1.5.0 and this new branch to compare the results if you can share some, or if you want to do it yourself. You can download the raw test.py (in a zip file for github) that I used. It is not well documented, but should give an insight on how I got my results, and how anyone could try to reproduce them.

Here are my results (with passing in the beginning_file_size):	File hash	File size	Peak Memory 1.5.0	Execution time (3 runs) 1.5.0	New Peak Memory
67b54f709895fa88b5153b568e62df5fb866237a1b3050502e7bee95a5a41738	482.577MB	1.884GB	[5.21, 5.26, 5.19]	485.421MB	[3.26, 3.25, 3.32]
49c95279a836da84ad244a05817ab6fa1d8f6cb40a6c7fee4634e345c0e4a5b4	100.000MB	595.951MB	[0.73, 0.72, 0.72]	495.951MB	[0.52, 0.53, 0.51]
f9bf1e19763fd30242be3f80495518da5aa4604dd5085ce260dbb12d5dd67488	43.152MB	239.493MB	[0.31, 0.33, 0.32]	196.341MB	[0.24, 0.23, 0.24]
65669e873a3732f1617c9c80667a1c3efda5f72538b5abd475e80a25efc0e5e2	313.823MB	641.474MB	[0.45, 0.44, 0.42]	341.446MB	[0.05, 0.04, 0.04]
76f7f979a7af7f69eea4ab32e232d2c89dfbf7d0468736582b46a87c855a2422	80.198MB	468.657MB	[0.57, 0.6, 0.57]	388.460MB	[0.43, 0.42, 0.42]
6a6f3488fa5927539aa37ad12a668f77ce8725534f3e30168fa2d92dde9add89	31.964MB	187.968MB	[0.19, 0.18, 0.19]	156.004MB	[0.18, 0.17, 0.16]
90ffb9eade13d75f95e25c0b0aaa9a1f9171849cb81f1e2e9494c1fa801deee1	353.281MB	2.066GB	[2.73, 2.65, 4.76]	1.721GB	[1.73, 1.71, 1.74]
36c32162148bf6fe8785020d68300d10f223ad59b47e6f4fedf7bf78f992f014	400.000MB	2.343GB	[3.17, 3.12, 3.13]	1.952GB	[2.06, 1.91, 1.94]
c3bbcf49833323978f3df6a3ae4d27cd278930ca78c5b178d6c7558c0b6210a2	500.000MB	2.926GB	[4.17, 4.03, 4.03]	2.438GB	[2.57, 2.48, 2.49]
9892c1e9c834cf5f2c580baa34ba27d9f9d024cbac89bd2f226b1f723582bbb8	302.261MB	1.179GB	[5.14, 5.12, 5.24]	306.829MB	[4.31, 4.26, 4.27]
c6fda8a049ebd7872358acfa2505f226e931e0f71090c19412e7b6d0a1c6e129	302.368MB	1.179GB	[5.15, 5.28, 5.26]	307.127MB	[4.2, 4.27, 4.21]
9900f584d89ef25cdae93a64eb5243df98fc787b006f846f11582a8b150353fc	84.705MB	491.547MB	[1.95, 1.9, 1.89]	406.842MB	[1.61, 1.58, 1.59]
347248cacef4596adbcddb5dbba62e050ddf223548834e8646bf43f96552a328	300.000MB	1.756GB	[2.22, 2.21, 2.2]	1.463GB	[1.46, 1.47, 1.42]
9adeeeb9e86d4fa02ac88515131b89b2e912a79e9c0481e0e1254a6a70fd3512	118.460MB	709.343MB	[3.21, 3.02, 3.18]	590.883MB	[2.78, 2.69, 2.63]
bc1f3d36f8bb9afa4a1a2dfee41fd592b2896865329ce75d78b7fdada774ba8e	35.340MB	206.358MB	[0.31, 0.28, 0.28]	171.018MB	[0.22, 0.22, 0.24]
b4bd0f04813e92852bb4344b2ce9d15259e628c850bfe3b5e0536977fe6a523d	300.000MB	1.755GB	[2.3, 2.19, 2.19]	1.462GB	[1.58, 1.66, 1.69]
158d07ab617c101fe9bda772225e07451b06399e1bc240d657c5b5f2f3fc03be	39.323MB	182.218MB	[1.75, 1.68, 1.7]	39.683MB	[1.63, 1.65, 1.66]

I am looking forward to any feedback!

Squiblydoo commented 10 months ago

Awesome. Thanks for all the information and the Pull Request. I am in the process of reviewing it and will likely complete review within a few days.

Squiblydoo commented 10 months ago

I'm going to go ahead and merge the changes. Everything makes sense from my review. Do you mind if I include the test.py within the Debloat repository? It fits some needs that I had not solved for previously.

In regards to further improvements:

I'm experimenting using the refinery_trim function in place of the dynamic trim. Practically, if you replace line 390 with delta_last_non_junk = refinery_strip(pe, biggest_section_data) and line 513 with end_of_real_data = pe.get_overlay_data_start_offset() + refinery_strip(pe, overlay) it will be fully implemented.

You may want to review the original method in Binary Refinery here, pestrip.py. That will provide more context on its original use.

A factor in regards to the refinery_trim is line 237, the threshold. Debloat has it default to "1" which is maximum aggressiveness. With this setting, I found that the memory usage was reduced by 2/3rds. I believe the default for refinery is actually 0.05 and 0 will only remove repeated bytes. 1 is most aggressive. (The code for modifying the threshold is already in place within debloat.) With thresholds lower than 1, I observed that the trimming usually failed and processing time was often increased to 30 seconds.* There could be a problem with my implementation though. Update Apparently the threshold 1 will remove the whole thing. So in the case of sections or times the malware needs some bytes in the overlay, it is not an acceptable setting.

*NOTE; With the current test.py script, failure to debloat can be observed when the results output the same hash as the previous analysis.

The hashes in the following table aren't very important since they differ due to different method of removing data. However, in some cases, they do remove the same amount of data.

What I haven't tested is to confirm manually/automatically that no critical data has been removed.

Failed cases below are instances that debloat is unable to handle.

Memory Improvements + Debloat's dynamic Trim:

Filename	Size	SHA256 Hash	Size in bytes	Memory Peak	Execution time
Overlay-NullBytes3.malz	762.939MB	46aeb0...	14699520	3.672GB	[2.72, 2.79, 2.85]
NON-Null-Overlay1.malz	815.940MB	8442e6...	11912192	3.940GB	[13.65, 14.67, 15.6]
NON-Null-Overlay-Random.malz	674.557MB	97fed9...	566784	3.292GB	[12.93, 12.02, 13.05]
Themida-Overlay-UnknownFamily.malz	439.726MB	441ab8...	2695168	2.144GB	[7.95, 8.04, 7.86]
Resource4.malz	300.388MB	150f5e...	2480030	305.179MB	[3.39, 3.59, 3.46]
Resource.malz	300.348MB	abf3d2...	1403904	303.209MB	[3.26, 3.24, 3.27]
Section-UnknownPacker.malz	705.287MB	abf3d2...	1403904	3.423GB	[7.37, 7.35, 8.03]
OverlayHighCompression.malz	734.451MB	abf3d2...	1403904	2.148GB	[1.14, 1.16, 1.12]
Overlay-AfterSignature2.malz	738.980MB	04d8ac...	868352	740.673MB	[0.0, 0.0, 0.0]
Packed.malz	726.000MB	125a8a...	1499136	2.127GB	[4.9, 4.61, 4.78]
DotNetResource.malz	307.925MB	---Processing Failed----	------	----	[2.37, 2.33, 2.33]
Section3.malz	325.880MB	7d54aa...	7214592	1.568GB	[3.42, 3.41, 3.46]
Overlay-Random-Chunks.malz	604.650MB	f8973d...	328192	2.951GB	[12.39, 12.87, 13.03]
Overlay-NullBytes2.malz	762.939MB	160ea0...	16570880	3.664GB	[2.86, 2.7, 2.71]
Section2.malz	709.919MB	0c5c73...	952320	3.463GB	[7.41, 7.32, 7.18]
Section1.malz	1.264GB	f9a08f...	82432	3.793GB	[16.13, 16.09, 15.85]
Bloat_After_SignatureExample.malz	636.277MB	e1816c...	1575424	639.319MB	[0.0, 0.0, 0.0]

Memory Improvements + Refinery Strip

Filename	Size	Output SHA256 Hash	Size bytes	Memory Peak	Execution time
Overlay-NullBytes3.malz	762.939MB	46aeb0...	14699520	1.505GB	[0.48, 0.48, 0.46]
NON-Null-Overlay1.malz	815.940MB	2cabe1...	11896610	1.605GB	[0.47, 0.49, 0.48]
NON-Null-Overlay-Random.malz	674.557MB	82c42d...	539498	1.318GB	[0.42, 0.41, 0.42]
Themida-Overlay-UnknownFamily.malz	439.726MB	c23ffc...	2679856	883.933MB	[0.32, 0.32, 0.32]
Resource4.malz	300.388MB	150f5e...	2480030	305.179MB	[3.9, 3.76, 3.42]
Resource.malz	300.348MB	abf3d2...	1403904	303.208MB	[3.3, 3.3, 3.71]
Section-UnknownPacker.malz	705.287MB	85fd15...	5539328	1.372GB	[5.94, 5.91, 5.81]
OverlayHighCompression.malz	734.451MB	a72adb...	2127360	1.436GB	[0.47, 0.47, 0.49]
Overlay-AfterSignature2.malz	738.980MB	04d8ac...	868352	740.673MB	[0.0, 0.0, 0.0]
Packed.malz	726.000MB	7ff8cf...	185346	1.418GB	[0.47, 0.5, 0.5]
DotNetResource.malz	307.925MB	---Processing Failed----	----	-----	[2.4, 2.41, 2.34]
Section3.malz	325.880MB	28e62e...	6166016	645.925MB	[2.48, 2.46, 2.47]
Overlay-Random-Chunks.malz	604.650MB	0a40ea...	300724	1.181GB	[0.4, 0.4, 0.4]
Overlay-NullBytes2.malz	762.939MB	160ea0...	16570880	1.506GB	[0.51, 0.49, 0.49]
Section2.malz	709.919MB	0c5c73...	952320	1.386GB	[4.61, 4.64, 4.62]
Section1.malz	1.264GB	89cd0e...	72192	2.529GB	[8.59, 8.46, 8.51]
Bloat_After_SignatureExample.malz	636.277MB	e1816c...	1575424	639.319MB	[0.01, 0.0, 0.0]

Squiblydoo commented 10 months ago

I have somewhat more confidence in the Refinery_Strip method than my own Dynamic_trim due to the author's skill; I just haven't confirmed that it works consistently as expected when set to the full aggressive setting.

I plan to do review of this today; I can write all the patched binary to a directory and manually inspect them against their originals to determine the information removed. There may be faster or smarter methods, but this is one that I believe I can complete easily enough.

Some samples, like the Packed examples (or this Emotet ) contain important bytes in the overlay. I'm fairly sure that the Refinery_Strip won't remove them, but it is something I'd like to be certain of. (In my dynamic trim, I often erred on the side of caution and left extra bytes.)

Update: Talking with Jesko we identified that the pestrip as implemented in Binary Refinery was unable to handle the important bytes in the overlay. This commit to Binary Refinery introduced new capability to handle them.

gdesmar commented 10 months ago

More things make sense with the refinery link. I did see your comments at the top of the file, but did not investigate it before. Regarding the threshold in refinery_strip, I tried to keep my changes to a minimum, but the whole if 0 < threshold < 1: code block is not accessible. I see now that it's an artefact from refinery, and that you may re-enable it at some point. Regarding test.py, I would be glad if it was added directly to the repository (albeit with a better name). If it was to be used more often, it would probably be best to make it a bit nicer, like using tempfile for the memray.bin and out files, or at least clean those up after the execution of the script, and exit with a warning before overwriting/deleting them.

Squiblydoo / debloat

Reduce memory usage #18

In regards to further improvements:

Memory Improvements + Debloat's dynamic Trim:

Memory Improvements + Refinery Strip