Terraspace / UASM

UASM - Macro Assembler
http://www.terraspace.co.uk/uasm.html
Other
220 stars 49 forks source link

RIP-relative addressing and unnecessary COFF relocations #115

Closed mazegen closed 1 year ago

mazegen commented 5 years ago

Hi,

the following code generates RIP-relative addressing with REL32_1 COFF relocation:

.data
 data_start DD 11223344h
.code
 cmp dword ptr [data_start], 1
 end

The relocation must be generated because data_start lies in another section.

The following code generates RIP-relative addressing and also REL32_1 relocation:

.code
 cmp dword ptr [data_start], 1
 nop
 data_start DD 11223344h
end

However, in this case the COFF relocation is unnecessary because data_start is in the same section. This code is effectively the same as:

.code
 cmp dword ptr [rip+(data_start-next)], 1
next:
 nop
 data_start DD 11223344h
end

This code works right and generates no relocations.

Can you get rid of the COFF relocation in this case?

mazegen commented 4 years ago

It would be great if we could generate pure RIP relocation using macro:

RIPREL MACRO lbl:REQ
 EXITM <(type lbl) ptr [rip + (lbl-end_of_current_instruction)]
ENDM

mov bl, [RIPREL(a_label)]

The problem is the end_of_current_instruction symbol. As far as I know, there is no symbol available that represents it. We only have $, the current location. It would be very useful to have kind of RIP-relative addressing available in the preprocessor through another symbol.

mazegen commented 4 years ago

See also http://masm32.com/board/index.php?topic=8263.0

mazegen commented 2 years ago

Hi johnsa, is there any chance this can be implemented? It should be easy for labels that are in the same section as the instruction, right? ;)

john-terraspace commented 2 years ago

Hey, yep this was #1 on my list for 2.56 :)

john-terraspace commented 2 years ago

Done. Changes in 2.56 branch. In the parser we no longer generate a fixup entry for a symbol who has RELOC32 in a 64bit section where the current and target sections are the same.

john-terraspace commented 2 years ago

This turns out to be not so trivial... preventing the fixup is easy enough, the two problems are in the way the assembler works that by default the fixup and an addr of 0 are written out, and the fixups are used for backpatching across passes. At the point the fixup is generated we don't have enough info to calculate the proper displacement as the codegen hasn't fully run. That however isn't the main issue, the main issue is that COFF seems to depend on the fixup data for other things, including identifying symbols in disassembly and supporting symbolic debugging information. If you generate the COFF without the fixup, you can't debug properly: image

john-terraspace commented 2 years ago

Ok, I think I have it. Updated 2.56 branch. I've removed the COFF relocations, amended the debug data where it's needed and added a custom back-patching to update the RIP before writing the data out. So-far so good on my tests.

vid512 commented 2 years ago

Thanks a lot! There's still something to fix. I've tried current 2.56 branch with our project, and there is a crash in backptch.c:202. The "fixup2->sym" being dereferenced is 0. This line:

DebugMsg(("for sym=%s fixup loc %" I32_SPEC "X changed to %" I32_SPEC "X\n", fixup2->sym->name, fixup2->locofs - size, fixup2->locofs ));
john-terraspace commented 2 years ago

Will double check, thanks! - If you just remove the DebugMsg line does it work?

vid512 commented 2 years ago

Yes, so far it seems to work. I'll let you know if anything.

john-terraspace commented 2 years ago

It was a silly debug message, I routinely go through and remove them to be honest. They're of little or no value, and in this case dereferencing a null pointer. silly.

mazegen commented 2 years ago

Hi john, many thanks for implementing it. However, the following code still generates REL32 COFF relocation. Do you think this can be solved too?

.code
 lea rax, [lbl]   ; compiles to 488D0500000000 as expected
 nop
lbl:
 nop
end

dumpbin.exe says:

dumpbin.exe /relocations rip_rel.obj
Microsoft (R) COFF/PE Dumper Version 14.00.24210.0

Copyright (C) Microsoft Corporation.  All rights reserved.

Dump of file rip_rel.obj                                                                                             

File Type: COFF OBJECT                                                                                               

RELOCATIONS #1                                                            
                                               Symbol    Symbol
Offset    Type              Applied To         Index     Name  
--------  ----------------  -----------------  --------  ------
00000003  REL32                      00000000         6  lbl
mazegen commented 2 years ago

Sorry, I was too fast, the following code still generates REL32_1, I need to check my uasm build.

.code
 cmp dword ptr [data_start], 1
 nop
 data_start DD 11223344h
end
vid512 commented 2 years ago

John: Turns out I was mistakenly testing with master, instead of v2.56 branch. I have also supplied mazegen with this incorrect version, so please disregard his last messages as well. Sorry, I'll test with proper 2.56 and let you know soon.

vid512 commented 2 years ago

Results of testing with simple example.

rip_rel.asm contains:

.code
  nop
lbl1:
  nop
  lea rax, [lbl1]
  lea rax, [lbl2]
  nop
lbl2:
  nop
end

Building it with 2.56:

c:\dev\_tools\uasm-2.56\UASM\bin>uasm64 -win64 /Fl rip_rel.asm
UASM v2.56, Oct 11 2022, Masm-compatible assembler.
Portions Copyright (c) 1992-2002 Sybase, Inc. All Rights Reserved.
Source code is available under the Sybase Open Watcom Public License.

size shrank from 13 to 12 in pass 2
rip_rel.asm: 10 lines, 3 passes, 5 ms, 0 warnings, 0 errors
126 items in symbol table, expected 126
max items in a line=1, lines with 0/1/<=5/<=10 items=8066/126/0/0,
2174 items in resw table, max items/line=6 [0=619 1=672 397 156 44 8 4 0]
invokation CATSTR=0 SUBSTR=0 SIZESTR=0 INSTR=0 EQU(text)=0
memory used: 402 kB

The resulting .obj seems to have correct displacement and no relocations:

c:\dev\_tools\uasm-2.56\UASM\bin>dumpbin /nologo /disasm /relocations rip_rel.obj

Dump of file rip_rel.obj

File Type: COFF OBJECT

  0000000000000000: 90                 nop
  0000000000000001: 90                 nop
  0000000000000002: 48 8D 05 F8 FF FF  lea         rax,[0000000000000001h]
                    FF
  0000000000000009: 48 8D 05 01 00 00  lea         rax,[0000000000000011h]
                    00
  0000000000000010: 90                 nop
  0000000000000011: 90                 nop

  Summary

           0 .data
          12 .text

The only problem seems to be the listing file, which reports displacement 0:

UASM v2.56, Oct 11 2022, Masm-compatible assembler.

rip_rel.asm
                                .code
00000000  90                      nop
00000001                        lbl1:
00000001  90                      nop
00000002  488D0500000000          lea rax, [lbl1]
00000009  488D0500000000          lea rax, [lbl2]
00000010  90                      nop
00000011                        lbl2:
00000011  90                      nop
                                end

The listing still needs some fixing.

vid512 commented 2 years ago

Another weird problem. The current 2.56 version reports "symbol redefinition" errors, when there is no symbol redefinition. This seems to be somehow triggered by using PROC.

win.asm:

option casemap:none    ;needed for windows.inc
include windows.inc

.code

xx PROC
RET
xx ENDP

end

Trying to build it with 2.56:

c:\dev\_tools\uasm-2.56\UASM\bin>uasm64 -win64 /Fl /Ic:\dev\_tools\uasm\WinInc\Include win.asm
UASM v2.56, Oct 11 2022, Masm-compatible assembler.
Portions Copyright (c) 1992-2002 Sybase, Inc. All Rights Reserved.
Source code is available under the Sybase Open Watcom Public License.

THREAD_PRIORITY_BELOW_NORMAL    EQU     ( THREAD_PRIORITY_LOWEST + 1 )
c:\dev\_tools\uasm\WinInc\Include\winbase.inc(508) : Error A2143: Symbol redefinition: THREAD_PRIORITY_BELOW_NORMAL
 c:\dev\_tools\uasm\WinInc\Include\winbase.inc(508): Included by
  c:\dev\_tools\uasm\WinInc\Include\windows.inc(112): Included by
   win.asm(2): Main line code
THREAD_PRIORITY_ABOVE_NORMAL    EQU     ( THREAD_PRIORITY_HIGHEST - 1 )
c:\dev\_tools\uasm\WinInc\Include\winbase.inc(511) : Error A2143: Symbol redefinition: THREAD_PRIORITY_ABOVE_NORMAL
 c:\dev\_tools\uasm\WinInc\Include\winbase.inc(511): Included by
  c:\dev\_tools\uasm\WinInc\Include\windows.inc(112): Included by
   win.asm(2): Main line code
win.asm: 10 lines, 2 passes, 670 ms, 0 warnings, 2 errors
36416 items in symbol table, expected 36416
max items in a line=17, lines with 0/1/<=5/<=10 items=94/429/5304/2316,
2174 items in resw table, max items/line=6 [0=619 1=672 397 156 44 8 4 0]
invokation CATSTR=0 SUBSTR=0 SIZESTR=0 INSTR=0 EQU(text)=13994
memory used: 591016 kB

If you comment the RET, the error disappears. The same example builds fine with UASM 2.52.

I get many more of those false "symbol redefinition" errors with our full codebase. This is the simplest case I've been able to isolate the probem to, without diving into WinInc internals.

john-terraspace commented 2 years ago

The listing wouldn’t have any knowledge about the fixup, there are still fixups, it’s just at the final COFF output stage that they’re excluded and the RIP fix applied.

It might be possible to change this behaviour, but it would be a very big ask.

From: vid512 @.> Sent: Tuesday, October 11, 2022 11:49 AM To: Terraspace/UASM @.> Cc: John Hankinson @.>; Comment @.> Subject: Re: [Terraspace/UASM] RIP-relative addressing and unnecessary COFF relocations (#115)

Results of testing with simple example.

rip_rel.asm contains:

.code nop lbl1: nop lea rax, [lbl1] lea rax, [lbl2] nop lbl2: nop end

Building it with 2.56:

c:\dev_tools\uasm-2.56\UASM\bin>uasm64 -win64 /Fl rip_rel.asm UASM v2.56, Oct 11 2022, Masm-compatible assembler. Portions Copyright (c) 1992-2002 Sybase, Inc. All Rights Reserved. Source code is available under the Sybase Open Watcom Public License.

size shrank from 13 to 12 in pass 2 rip_rel.asm: 10 lines, 3 passes, 5 ms, 0 warnings, 0 errors 126 items in symbol table, expected 126 max items in a line=1, lines with 0/1/<=5/<=10 items=8066/126/0/0, 2174 items in resw table, max items/line=6 [0=619 1=672 397 156 44 8 4 0] invokation CATSTR=0 SUBSTR=0 SIZESTR=0 INSTR=0 EQU(text)=0 memory used: 402 kB

The resulting .obj seems to have correct displacement and no relocations:

c:\dev_tools\uasm-2.56\UASM\bin>dumpbin /nologo /disasm /relocations rip_rel.obj

Dump of file rip_rel.obj

File Type: COFF OBJECT

0000000000000000: 90 nop 0000000000000001: 90 nop 0000000000000002: 48 8D 05 F8 FF FF lea rax,[0000000000000001h] FF 0000000000000009: 48 8D 05 01 00 00 lea rax,[0000000000000011h] 00 0000000000000010: 90 nop 0000000000000011: 90 nop

Summary

       0 .data
      12 .text

The only problem seems to be the listing file, which reports displacement 0:

UASM v2.56, Oct 11 2022, Masm-compatible assembler.

rip_rel.asm .code 00000000 90 nop 00000001 lbl1: 00000001 90 nop 00000002 488D0500000000 lea rax, [lbl1] 00000009 488D0500000000 lea rax, [lbl2] 00000010 90 nop 00000011 lbl2: 00000011 90 nop end

The listing still needs some fixing.

— Reply to this email directly, view it on GitHub https://github.com/Terraspace/UASM/issues/115#issuecomment-1274493034 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AEAZAVHMU2TW43RG42GYSO3WCVAYXANCNFSM4I5YKJ5Q . You are receiving this because you commented.Message ID: @.***>

john-terraspace commented 2 years ago

Thanks. I had noticed this one as well, I didn’t have a very simple reproducible case for it though.

From: vid512 @.> Sent: Tuesday, October 11, 2022 12:44 PM To: Terraspace/UASM @.> Cc: John Hankinson @.>; Comment @.> Subject: Re: [Terraspace/UASM] RIP-relative addressing and unnecessary COFF relocations (#115)

Another weird problem. The current 2.56 version reports "symbol redefinition" errors, when there is no symbol redefinition. This seems to be somehow triggered by using PROC.

win.asm:

option casemap:none ;needed for windows.inc include windows.inc

.code

xx PROC RET xx ENDP

end

Trying to build it with 2.56:

c:\dev_tools\uasm-2.56\UASM\bin>uasm64 -win64 /Fl /Ic:\dev_tools\uasm\WinInc\Include win.asm UASM v2.56, Oct 11 2022, Masm-compatible assembler. Portions Copyright (c) 1992-2002 Sybase, Inc. All Rights Reserved. Source code is available under the Sybase Open Watcom Public License.

THREAD_PRIORITY_BELOW_NORMAL EQU ( THREAD_PRIORITY_LOWEST + 1 ) c:\dev_tools\uasm\WinInc\Include\winbase.inc(508) : Error A2143: Symbol redefinition: THREAD_PRIORITY_BELOW_NORMAL c:\dev_tools\uasm\WinInc\Include\winbase.inc(508): Included by c:\dev_tools\uasm\WinInc\Include\windows.inc(112): Included by win.asm(2): Main line code THREAD_PRIORITY_ABOVE_NORMAL EQU ( THREAD_PRIORITY_HIGHEST - 1 ) c:\dev_tools\uasm\WinInc\Include\winbase.inc(511) : Error A2143: Symbol redefinition: THREAD_PRIORITY_ABOVE_NORMAL c:\dev_tools\uasm\WinInc\Include\winbase.inc(511): Included by c:\dev_tools\uasm\WinInc\Include\windows.inc(112): Included by win.asm(2): Main line code win.asm: 10 lines, 2 passes, 670 ms, 0 warnings, 2 errors 36416 items in symbol table, expected 36416 max items in a line=17, lines with 0/1/<=5/<=10 items=94/429/5304/2316, 2174 items in resw table, max items/line=6 [0=619 1=672 397 156 44 8 4 0] invokation CATSTR=0 SUBSTR=0 SIZESTR=0 INSTR=0 EQU(text)=13994 memory used: 591016 kB

If you comment the RET, the error disappears. The same example builds fine with UASM 2.52.

I get many more of those false "symbol redefinition" errors with our full codebase. This is the simplest case I've been able to isolate the probem to, without diving into WinInc internals.

— Reply to this email directly, view it on GitHub https://github.com/Terraspace/UASM/issues/115#issuecomment-1274552127 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AEAZAVE62GFBDVCYQCRDQJTWCVHHHANCNFSM4I5YKJ5Q . You are receiving this because you commented.Message ID: @.***>

john-terraspace commented 2 years ago

I believe the DUPLICATE SYMBOL issue is now resolved, it was a result of the change to improve the listing outputs in another issue. Please try again and let me know.

vid512 commented 2 years ago

No more "duplicate symbol" errors with our codebase. This problem seems resolved.

Tomorrow I'll try to switch to the RIP-relative addressing project-wide, and we'll see if any new errors pop out. Fingers crossed.

vid512 commented 2 years ago

Could this be some recent regression?

win.asm:

end

Causes:

c:\dev\_tools\uasm-2.56\UASM\bin>uasm64 -win64 win.asm -Fl win.lst
UASM v2.56, Oct 11 2022, Masm-compatible assembler.
Portions Copyright (c) 1992-2002 Sybase, Inc. All Rights Reserved.
Source code is available under the Sybase Open Watcom Public License.

win.asm: 1 lines, 2 passes, 1 ms, 0 warnings, 0 errors
124 items in symbol table, expected 124
max items in a line=1, lines with 0/1/<=5/<=10 items=8068/124/0/0,
2174 items in resw table, max items/line=6 [0=619 1=672 397 156 44 8 4 0]
invokation CATSTR=0 SUBSTR=0 SIZESTR=0 INSTR=0 EQU(text)=0
memory used: 401 kB
STDFUNC MACRO method:REQ, retType:REQ, protoDef:VARARG
win.lst(0) : Error A2099: END directive required at end of file                                        <----------------
win.lst: 0 lines, 1 passes, 3 ms, 0 warnings, 1 errors
124 items in symbol table, expected 124
max items in a line=1, lines with 0/1/<=5/<=10 items=8068/124/0/0,
2174 items in resw table, max items/line=6 [0=619 1=672 397 156 44 8 4 0]
invokation CATSTR=0 SUBSTR=0 SIZESTR=0 INSTR=0 EQU(text)=0
memory used: 401 kB

Same thing happens with listing for basically any file I've tried. Latest commit was "update sysv abi invoke".

vid512 commented 2 years ago

Also, it would make sense to disallow addressing like [rip + label + 2*rax]. At the moment, this is same as [label + 2*rax], eg. standard base+scale*index+displacement addressing, with COFF relocation on the displacement, without anything RIP-relative there.

mazegen commented 2 years ago

I think it's clearly a bug because lea rax, [rip+rax] assembles to lea rax, [rax+00000000] with SIB byte (48 8D 04 05 00000000)

john-terraspace commented 2 years ago

Are you sure this is the latest from 2.56 branch? I can't recreate the listing problem and I've tried on a number of sources now, oddly however I can't use the assembler with the command line options as you specify, that doesn't work at all it needs to be uasm -win64 -Fl=out.lst win.asm Let me know how that goes, I'll investigate preventing that EA mode. I believe the only valid option is [RIP + disp], so that would include a label. adding an index/or scale shouldn't be allowed?

vid512 commented 2 years ago

Now I understand what was happening. I must have had win.lst preexisting from previous (correct) command, and then I wrongly assumed /Fl takes operand in a getopt-y way (with space instead of '='). So, with my command line, UASM first assembled win.asm, then it somehow ignored -Fl without value, and tried to assemble win.lst. Failing, because it couldn't find 'end' directive there. Sorry about another false alarm. I've been out of touch with these tools for some time, now I do stupid mistakes like this.

john-terraspace commented 2 years ago

Don't worry.. me too, I have so little time these days when I find a gap during the year I do a sudden burst of asm related work. Things are further compounded because I'm also doing stuff on 68k (Amiga) assembler and then I start trying to put operands the wrong way around :)

john-terraspace commented 2 years ago

I've updated the branch again, it should now prevent [ RIP+REG ] from being encoded in any way. There are a few combinations which are technically valid, although not very useful where you can do RIP+DISP like lea rax,[rip+lbl] but it requires NOLARGEADDRESSAWARE and an ADDR32 reloc.

mazegen commented 1 year ago

Hi johnsa, thanks for this feature, it works well :) and in many cases, it actually makes the size of .obj much smaller because there are much less relocations now.