GrammaTech / ddisasm

A fast and accurate disassembler
https://grammatech.github.io/ddisasm/
GNU Affero General Public License v3.0
645 stars 60 forks source link

Global struct array splitted into two parts #46

Closed 5c4lar closed 2 years ago

5c4lar commented 2 years ago
#include <stdio.h>
#include <stdlib.h>
struct foo {
    int x;
    char y;
} ga[10];

void bar() {
    for (int i = 0; i < 10; i++) {
        ga[i].x = 1;
        ga[i].y = 2;
    }
    for (size_t i = 0; i < 10; i++)
    {
        printf("%d %d\n", ga[i].x, ga[i].y);
    }
}

int main() {
    bar();
}

For the above C code, ddisasm will split the global struct array into two datablocks, one is of size 4 and the other is of size 76. I can understand why it results in such behavior, but can this be improved?

#-----------------------------------
.globl FUN_4011e0
.type FUN_4011e0, @function
#-----------------------------------
FUN_4011e0:

.cfi_startproc 
.cfi_lsda 255
.cfi_personality 255
.cfi_def_cfa 7, 8
.cfi_offset 16, -8
            push RBP
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
            mov RBP,RSP
.cfi_def_cfa_register 6
            sub RSP,16
            mov DWORD PTR [RBP-4],0
.L_4011ef:

            cmp DWORD PTR [RBP-4],10
            jge .L_401222

            movsxd RAX,DWORD PTR [RBP-4]
            mov DWORD PTR [RAX*8+.L_404050],1
            movsxd RAX,DWORD PTR [RBP-4]
            mov BYTE PTR [RAX*8+.L_404054],2
            mov EAX,DWORD PTR [RBP-4]
            add EAX,1
            mov DWORD PTR [RBP-4],EAX
            jmp .L_4011ef
.L_401222:

            mov QWORD PTR [RBP-16],0
.L_40122a:

            cmp QWORD PTR [RBP-16],10
            jae .L_40126e

            mov RAX,QWORD PTR [RBP-16]
            mov ESI,DWORD PTR [RAX*8+.L_404050]
            mov RAX,QWORD PTR [RBP-16]
            movsx EDX,BYTE PTR [RAX*8+.L_404054]
            movabs RDI,OFFSET .L_402004
            mov AL,0
            call printf@PLT

            mov RAX,QWORD PTR [RBP-16]
            add RAX,1
            mov QWORD PTR [RBP-16],RAX
            jmp .L_40122a
.L_40126e:

            add RSP,16
            pop RBP
.cfi_def_cfa 7, 8
            ret 
.cfi_endproc 

.align 16
.L_404040:
          .zero 16
.L_404050:
          .zero 4
.L_404054:
          .zero 76
adamjseitz commented 2 years ago

It seems like the best we could ask for in the assembly output would be to recognize this pattern and generate:

...

    mov DWORD PTR [RAX*8+.L_404050],1
    movsxd RAX,DWORD PTR [RBP-4]
    mov BYTE PTR [RAX*8+.L_404050+4],2
    mov EAX,DWORD PTR [RBP-4]
    add EAX,1

...

.L_404050:
          .zero 80

The existing data access pattern propagation code already does some recognition of this kind of "synchronized" data accesses to know that it is safe to continue propagating the data access, instead of considering them a collision:

https://github.com/GrammaTech/ddisasm/blob/master/src/datalog/data_access_analysis.dl#L331

To alter the output, I think you'd need to break that recognition out into an intermediate relation, and use that to generate a moved_data_label to make the symbolic operand refer to .L_404050 instead of .L_404054.

Maybe this is a contribution you would be able to make? Note that we have a CLA that we ask every contributor sign and email to CLA@GrammaTech.com (see our CONTRIBUTING.md).

5c4lar commented 2 years ago

Maybe this is a contribution you would be able to make?

Thanks for telling me how to do this! Tried to achieve it according to your suggestions, but changed moved_label instead of moved_data_label since those addresses appear in the operand.

Note that we have a CLA that we ask every contributor sign and email to CLA@GrammaTech.com (see our CONTRIBUTING.md).

I sent the CLA but the email is rejected:

550 5.4.1 Recipient address rejected: Access denied. AS(201806281) [[CY1USG02FT011.eop-usg02.itar.protection.office365.us](http://cy1usg02ft011.eop-usg02.itar.protection.office365.us/)]
adamjseitz commented 2 years ago

I sent the CLA but the email is rejected:

550 5.4.1 Recipient address rejected: Access denied. AS(201806281) [[CY1USG02FT011.eop-usg02.itar.protection.office365.us](http://cy1usg02ft011.eop-usg02.itar.protection.office365.us/)]

Can you try to send directly to eschulte@grammatech.com? Thanks.

5c4lar commented 2 years ago

I sent the CLA but the email is rejected:

550 5.4.1 Recipient address rejected: Access denied. AS(201806281) [[CY1USG02FT011.eop-usg02.itar.protection.office365.us](http://cy1usg02ft011.eop-usg02.itar.protection.office365.us/)]

Can you try to send directly to eschulte@grammatech.com? Thanks.

Sure. It has been sent successfully.