d0c-s4vage / pfp

pfp - Python Format Parser - a python-based 010 Editor template interpreter
MIT License
196 stars 37 forks source link

Unexpected output size #59

Open Manouchehri opened 7 years ago

Manouchehri commented 7 years ago

Not sure what I'm doing wrong, but the output of mutated is exactly 32 times as large as I expected.

Sample: https://github.com/Manouchehri/loremelf

ubuntu@964c0cbfef64:/pfp/ $ git clone https://github.com/Manouchehri/loremelf
ubuntu@964c0cbfef64:/pfp/ $ cd loremelf/
ubuntu@964c0cbfef64:/pfp/loremelf$ python2 coder.py
ubuntu@964c0cbfef64:/pfp/loremelf$ ls -l *put*/
input/:
total 68
-rwxr-xr-x 1 dave dave 68464 May  3 00:29 date

output/:
total 1424
-rw-rw-r-- 1 dave dave 1454740 May  3 13:06 bin.0.elf
...
for mutation in pfp.fuzz.mutate(dom, IntegersOnly, num=1, at_once=1):
    mutated = mutation._pfp__build()
    filename_out = "output/bin." + str(counter) + ".elf"
    with open(filename_out, 'wb') as file:
        file.write(bytes(mutated))
        file.close()
...
d0c-s4vage commented 7 years ago

Thanks for submitting a new issue! I'll take a look when I get a chance. Any chance you could share the 010 template you're using?

Manouchehri commented 7 years ago

https://github.com/Manouchehri/loremelf/blob/master/ELF.bt

d0c-s4vage commented 7 years ago

Awesome, thanks man!

Manouchehri commented 7 years ago

Not a solution, but for anyone else who runs into this thread, I ended up getting https://github.com/lunixbochs/patchkit to emit ELFs.

d0c-s4vage commented 5 years ago

Bump - revisiting this

d0c-s4vage commented 5 years ago

Simplifying the bug:

dom = pfp.parse(
    data_file = file,
    template_file = "tests/templates/elf.bt"
)

print dom._pfp__show()

with open("/tmp/test.elf", "wb") as f:
    dom._pfp__build(f)

The /tmp/test.elf is still the 1454740 size instead of the original date size. I suspect something is happening with the unused/skipped data:

                symtab[31] = struct {                                                                                                   
                        sym_name   = struct {                    
                            sym_name_off = UInt(325 [00000145])
                            _skipped   = Char[1569] ('\x12\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')
                            sym_name_str = String('fputc')        
                        }                                      
                        sym_info   = struct {                                                                                           
                            sym_info_type = UChar(1 [01]):4         
                            sym_info_bind = UChar(2 [02]):4    
                        }                                                                                                               
                        sym_other  = UChar(0 [00])                
                        sym_shndx  = UShort(0 [0000])          
                        sym_value  = UInt64(0 [0000000000000000])                                                                                                                                                                  
                        sym_size   = UInt64(0 [0000000000000000])
                    }
d0c-s4vage commented 5 years ago

Hrmm, differences between the original parsed output, and the parsed, build (unmodified), and parsed output again:

42a43,53
>                         p_type     = Enum<UInt>(6 [00000006])(PT_PHDR)
>                         p_flags    = Enum<UInt>(5 [00000005])(PF_Read_Exec)
>                         p_offset_FROM_FILE_BEGIN = UInt64(64 [0000000000000040])
>                         p_vaddr_VIRTUAL_ADDRESS = UInt64(4194368 [0000000000400040])
>                         p_paddr_PHYSICAL_ADDRESS = UInt64(4194368 [0000000000400040])
>                         p_filesz_SEGMENT_FILE_LENGTH = UInt64(504 [00000000000001f8])
>                         p_memsz_SEGMENT_RAM_LENGTH = UInt64(504 [00000000000001f8])
>                         p_align    = UInt64(8 [0000000000000008])
>                         p_data     = Char[504] ('\x06\x00\x00\x00\x05\x00\x00\x00@\x00\x00\x00\x00\x00\x00\x00@\x00@\x00')
>                     }
>                 program_table_element[2] = struct {
51,52c62,63
<                         _skipped   = Char[392] ('\x01\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00@\x00')
<                         p_data     = Char[28] ('/lib64/ld-linux-x86-')
---
>                         _skipped   = Char[336] ('\x01\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00@\x00')
>                         p_data     = Char[28] ('R\xe5td\x04\x00\x00\x00\x10\xfe\x00\x00\x00\x00\x00\x00\x10\xfe`\x00')
54c65
<                 program_table_element[2] = struct {
---
>                 program_table_element[3] = struct {
65c76
<                 program_table_element[3] = struct {
---
>                 program_table_element[4] = struct {
74,75c85,86
<                         _skipped   = Char[64752] ('\x02\x00\x00\x00\x06\x00\x00\x00(\xfe\x00\x00\x00\x00\x00\x00(\xfe`\x00')
<                         p_data     = Char[1252] ('\xb0#@\x00\x00\x00\x00\x00\x90#@\x00\x00\x00\x00\x00\x00\x00\x00\x00')
---
>                         _skipped   = Char[64696] ('\x02\x00\x00\x00\x06\x00\x00\x00(\xfe\x00\x00\x00\x00\x00\x00(\xfe`\x00')
>                         p_data     = Char[1252] ('\x0e\x10B\x0e\x08H\x0be\x0e(D\x0e A\x0e\x18B\x0e\x10B')
77c88
<                 program_table_element[4] = struct {
---
>                 program_table_element[5] = struct {
86,99c97,98
<                         _skipped   = Char[64720] ('\x04\x00\x00\x00\x04\x00\x00\x00T\x02\x00\x00\x00\x00\x00\x00T\x02@\x00')
<                         p_data     = Char[464] ('\x01\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x00')
<                     }
<                 program_table_element[5] = struct {
<                         p_type     = Enum<UInt>(4 [00000004])(PT_NOTE)
<                         p_flags    = Enum<UInt>(4 [00000004])(PF_Read)
<                         p_offset_FROM_FILE_BEGIN = UInt64(596 [0000000000000254])
<                         p_vaddr_VIRTUAL_ADDRESS = UInt64(4194900 [0000000000400254])
<                         p_paddr_PHYSICAL_ADDRESS = UInt64(4194900 [0000000000400254])
<                         p_filesz_SEGMENT_FILE_LENGTH = UInt64(68 [0000000000000044])
<                         p_memsz_SEGMENT_RAM_LENGTH = UInt64(68 [0000000000000044])
<                         p_align    = UInt64(4 [0000000000000004])
<                         _skipped   = Char[196] ('P\xe5td\x04\x00\x00\x00T\xe5\x00\x00\x00\x00\x00\x00T\xe5@\x00')
<                         p_data     = Char[68] ('\x04\x00\x00\x00\x10\x00\x00\x00\x01\x00\x00\x00GNU\x00\x00\x00\x00\x00')
---
>                         _skipped   = Char[64664] ('\x04\x00\x00\x00\x04\x00\x00\x00T\x02\x00\x00\x00\x00\x00\x00T\x02@\x00')
>                         p_data     = Char[464] ('\x00\x00\x00\x00L\x00\x00\x00\xbc\x11\x00\x00\xc8\xa4\xff\xff5\x01\x00\x00')
100a100
>                 program_table_element[6] = struct {

viewed from vim:

image

Close to having this tracked down

d0c-s4vage commented 5 years ago

In the modified (parsed, built, and parsed again) output on the left, the field values are taken directly from the previously parsed p_data in the preceding struct.

d0c-s4vage commented 5 years ago

This is on hold until I figure out if proposed change 1 (Parsed fields have a parse order or index) and 2 (Unified Skipped/Unparsed Data Tracking) as described in https://github.com/d0c-s4vage/pfp/issues/82 are the best approaches.

For now, I am liking that approach, but they won't be small changes.

d0c-s4vage commented 4 years ago

I've let this stew for a while - I'm liking the approaches I outlined in #82. This definitely needs to be addressed.