Whither x86 JIT assembly files?

0xdaryl commented 6 years ago

The OpenJ9 x86 JIT uses a number of hand-written assembly files as part of its runtime. The choice of assembly allows very precise control of the instructions, registers, and stack in some contexts, as well as customized linkages between sections of code. There are about 7000 lines of x86 assembly in the JIT and are written using MASM syntax (Microsoft Macro Assembler). This creates a challenge when building on Linux using the GNU assembler (gas) because it consumes a different syntax altogether.

Rather than maintain two independent but logically similar sets of source files that require dual-maintenance, a tool (masm2gas.pl) was written to convert MASM syntax files into AT&T syntax just-in-time during the build that can be consumed by gas. The decision behind writing a tool rather than using a common assembler on both platforms is historical. Many of these files were created during the first x86-64 port 15 years ago and at that time there weren't any assemblers mature enough that worked across operating systems that supported all the features the project needed (x86-64 support, macros, preprocessor directives, etc.). While a tooling solution was able to reduce the dual maintenance of the code, it is a fragile and incomplete solution that needs updates as new assembly is written that uses MASM syntax variants that weren't handled previously. Furthermore, it is incapable of handling conditional assembly, and to get around that some of the assembly files are actually run through the C preprocessor first (the files have a .pasm extension).

With the work underway to port OpenJ9 to macOS these assembly files will need to be built as well. Unfortunately, the LLVM assembler (llvm-mc) while similar to gas in terms of syntax and command-line options, does have some syntax differences that need to be dealt with. Rather than simply modifying (hacking) the translation tool to handle LLVM assembly, I think this is a good opportunity to take stock of where we are and evaluate what the long term strategy should be with these files.

Also, for whatever is decided here, I would like to give that strong consideration for the x86 assembly solution in Eclipse OMR. At present, there are no x86 assembly files but if they were to be introduced then an assembly solution is required.

Here are some options to consider:

"masm2gas.pl Forever, Dude!"

This essentially extends the role of masm2gas.pl to handle the syntax variants of llvm-mc. We have found documentation on this assembler lacking and had to scrutinize the source code to find the information we needed to make the translation (for macOS we did a manual translation of the .s files rather than modifying the tool). Extending this tool adds to its fragility and does nothing to improve or address its warts. If this is the solution that is ultimately agreed upon I think the tool is due for some refactoring (dare I say "rewrite"?).
Use the M4 macro processor (https://www.gnu.org/software/m4/manual/m4.html) to generate appropriate syntax for each assembler.

This is the approach taken by the OpenJ9 VM for its limited set of assembly files. There will be work involved to translate the existing files into a macro format, and this may impact the readability and debugability of the code.
Use the Netwide Assembler (NASM) (https://www.nasm.us/) as the common assembly solution across all operating systems.

We considered this many years ago, but NASM wasn't mature enough yet to meet our needs. However, its grown up quite a bit. The advantage with this is that the same assembler will run on all platforms, consume the same untranslated input files, and produce objects suitable for the target build environment. NASM syntax is very similar to MASM syntax so translation of the files should be straightforward, and the code will maintain the familiarity that developers expect. NASM is a stable, modern assembler supporting current Intel instructions and processors on all operating systems we care about.
Upon startup, use the JIT to generate the instructions directly into a code cache.

There is a precedent for other architectures (such as Power) to emit helpers directly into a code cache (I believe Power does it to guarantee helpers can be reached from any code cache). A similar technique could be employed here and doing so will eliminate the need for an assembler for runtime functions. However, the amount of code to be emitted this way is quite large, the translation could be error prone, debugging would be a hassle, and it will lead to an increase in the size of the JIT shared object as these runtime methods are programmatically constructed.
Convert the runtime assembly files to C with inline assembly as needed.

Avoid a runtime assembler altogether and convert the runtime helpers into portable, callable C functions. While it's possible to do and it solves the building issues, the main disadvantages are the large amount of code that will have to be translated from assembly to C (perhaps error prone) and the fact that the functions will now use C linkage which can impose performance challenges (and possibly footprint increases) to the JIT compiled code. I think parts of the assembly code could be converted to C when its not called on a performance or footprint sensitive path, but I don't think this solution will eliminate the need for assembly files altogether.

I have some opinions on which option I personally prefer, but I will remain silent for a bit while allowing others to chime in.

Thoughts anyone?

FYI: @andrewcraik, @0dvictor, @vijaysun-omr, @mstoodle, @nbhuiyan, @cathyzhyi, @leonardo2718, @rwy0717, @charliegracie, @DanHeidinga, @pshipton

0dvictor commented 6 years ago

I am so glad that you have brought this topic out. I have been thinking about this for quite awhile and started an effort to reduce the amount of handwritten assemblies. Going through each handwritten assembly method one-by-one, here is what I found not all of them should be in handwritten assembly in the first place:

X86-32 Only Assembly: 1) 64-bit mathematics (long division and long remainder): neither of the two should be handwritten assembly i. Both can be inlined code generated by evaluators, a high efficiency inlined code is not too large. ii. Both can be implemented in C (as a fastcall C Helper). Both helpers use all 6 GPRs that available on X86-32 and therefore, handwritten assembly does not give benefits in linkage comparing to a C Helper call. iii. We can also call libc instead of maintaining our own implementation. There is no confusion for integer mathematics. iv. I have a WIP to make it inlined code. 2) float/double to int/long conversions: they should be removed if X87 is gone i f2i/d2i and X87 version of f2l/d2l are only used when SSE is disabled, i.e. force X87 mode ii SSE version of f2l/d2l uses 5 of 6 available GPRs, the advantage comparing to a C Helper is very limited iii. I have a WIP to make the SSE version inlined code. 3) Compressed string related helpers: none of them should be in handwritten assembly i, They all use up all 6 GPRs that available on X86-32, and hence do not have advantages comparing to a C Helper. 4) Methods related to recompilation: they should be written assembly.

X86-64 Only Assembly: 1) Compressed string related helpers: they may be written in C instead i. The code is nearly identical with the X86-32 counter part ii. IMO they should be unified with X86-32 implementation as a C Helper 2) Methods related to recompilation: they should be written assembly.

Common Helpers for both X86-32 and X86-64 1) Array Translate: i. Some benchmark showed performance gains making this inlined code. ii. I have a WIP item to make it inlined. 2) UTF16 encoding: i. It probably can stay in assembly as the code is simple and does not use many registers. It still has potential benefits over a C Helper 3) Lock Reservation: i. They cannot be translated into C, and hence should stay assembly. 4) PIC Builder: i They cannot be translated into C, and hence should stay assembly.

In short, the only necessary handwritten assembly code is Recompilation, Lock Reservation and PIC Builder. Once my WIPs finish, more than half of the handwritten assembly will be gone.

0dvictor commented 6 years ago

My thoughts about your five options:

1) masm2gas.pl i. I am strongly against this idea. ii. It is hard to maintain the tool as we moving forward. Supporting to LLVM is already hard enough. iii. It does NOT support all MASM syntax. Many legit MASM code cannot be translated. iv. It does correctly translate local labels so that the functions are break apart in profiling and/or debugging tools. This make performance analysis harder. iv. Should any error be raised during assembling, the error message is hard to correlate to the original source code. 2) M4 macro processor i. In fact, not only VM but also JIT's Z CodeGen uses M4. ii. It may be a good idea to align with the rest of OpenJ9. iii. Unfortunately, it shares same drawbacks as masm2gas: not supporting local labels and it is hard to correlate error messages to the original source code. 3) The Netwide Assembler (NASM) i. I would love it if we chose it at day one. ii. It solves all draw backs that we have to face with either masm2gas or M4. iii. The only reason that I hesitate is it diverges X86 CodeGen from other OpenJ9 component. 4) JIT generated instruction upon startup i. It is a good solution for may helpers, but may not be feasible for all helpers. Recompilation and PIC Builder related helpers may be hard to JIT. ii. We do need this ability to generate certain helper to avoid SSE/AVX switching penalty while maintaining a relative small footprint. iii. I actually have a old prototype when working on AVX bring-up. 5) C with inline assembly (or intrinsics) i. It is a perfect solution for some of the helpers, but may not be feasible for all of them. ii. MSVC disallows inline assembly on X86-64, so that this approach is likely C with intrinsic. iii. Recompilation and PIC Builder related helpers may be hard to convert. iv. Current implementation of Lock Reservation helpers cannot be converted.

To sum up, 4) and 5) may not work for certain helpers but are perfect solutions for many. IMO, we should convert whatever methods that fits 4) or 5) then use either M4 or NASM for the leftovers. I will need a deeper thought about whether M4 or NASM is better.

andrewcraik commented 6 years ago

Thanks @0xdaryl for raising this issue and for the very considered comments @0dvictor. I think you have some very interesting ideas. I also think we need to take these kind of changes in a gradual or staged approach to reduce the risk of introducing bugs and disrupting development since x86 is a very popular platform.

I am against continuing masm2gas.pl - it is a unique solution, it is hard to maintain, we have to maintain it, and there are other solutions with lower maintenance costs and risks. I am not in favor of adopting M4 for the x86 code generator simply because I think there are better assembler tools that are used more widely. I am very supportive of NASM - it is very similar in syntax to what we currently use, it is widely used, and well understood. I think in the short term to move away from masm2gas.pl a move to NASM is the easiest to automate, verify and achieve.

Options 4 and 5 are interesting, but will need more time to prototype, experiment with, and refine before I would feel comfortable saying we want to adopt either them as part of 'the way forward'. I don't think that either of these need to be adopted as part of trying to get away from masm2gas.pl in the short term IMO.

mstoodle commented 6 years ago

@0dvictor can you please elaborate on this point about NASM, which I didn't understand as written:

it diverges X86 CodeGen from other OpenJ9 component

0dvictor commented 6 years ago

can you please elaborate on this point about NASM, which I didn't understand as written:

it diverges X86 CodeGen from other OpenJ9 component

@mstoodle I meant existing OpenJ9 components use M4 but none of them uses NASM. For example, VM and JIT's Z CodeGen.

mstoodle commented 6 years ago

ok, thanks for clarifying @0dvictor !

0dvictor commented 6 years ago

Found two good documents that I want to share about NASM: https://www.ibm.com/developerworks/library/l-gas-nasm/index.html https://www.nasm.us/doc/nasmdoc2.html#section-2.2

0xdaryl commented 6 years ago

I'd like to keep this discussion open for another week to give those who are away an opportunity to chime in when they return.

In the meantime, however, to give us a better picture of what a file written in NASM syntax looks like @nbhuiyan has kindly agreed to convert one of the existing assembler files and post a link here. He will also build and link it into the OpenJ9 product just to verify that it works. This experience will also give us an idea of how difficult it will be to convert the existing files should we decide to go this route.

DanHeidinga commented 6 years ago

Use the Netwide Assembler (NASM) (https://www.nasm.us/) as the common assembly solution across all operating systems.

@0xdaryl all operating systems = win, linux, mac? Is this solution extendable to the other hw platforms: s390, ppc (including aix), ppcle, arm & arm64?

@gacholio As the primary author of many of the .m4 files in the VM, do you see advantages / disadvantages to switching the VM asm files away from m4?

0xdaryl commented 6 years ago

@DanHeidinga : The NASM solution applies to x86 architectures only on Windows, Linux, and macOS.

gacholio commented 6 years ago

I see no advantage to moving away from m4. It works on all platforms, and it's fairly readable.

nbhuiyan commented 6 years ago

To give you an idea of how a JIT X86 assembly file would look like when written in NASM syntax, I have converted X86LockReservation.asm from MASM:

Despite the fact that I have never written in NASM format before, I feel that converting the asm file from MASM to NASM syntax was pretty straightforward, with the exception of how macro parameters work. I also found the NASM documentation to be pretty useful.

On Linux on X86_64, I have been able to link the object file generated by nasm into OpenJ9 JVM, and found no issues so far. This is still a work in progress, and currently I am mainly trying to solve the issue with symbols not being defined correctly.

If you are interested, here are the object files generated by gnu-as and nasm:

Note that gnu-as used the output of masm2gas.pl. As you can see, the .o generated by nasm is missing symbols in the symbol table that are defined through the command line and preprocessor directives.

0dvictor commented 6 years ago

I guess we can also use local labels to get rid of these:

    16: 0000000000000038     0 NOTYPE  LOCAL  DEFAULT    1 ..@20.trylock
    17: 0000000000000049     0 NOTYPE  LOCAL  DEFAULT    1 ..@20.fallback
    18: 0000000000000088     0 NOTYPE  LOCAL  DEFAULT    1 ..@27.trylock

0xdaryl commented 6 years ago

Thanks @nbhuiyan. I wonder, should we choose to use NASM, if we can validate the auto-conversion process to ensure it was done correctly by dumping the assembly in a NASM-converted object file and a MASM object file and comparing the two. I believe the assembly should more or less match and give us confidence that we are bug-for-bug compatible. :-)

0xdaryl commented 6 years ago

As discussion on this topic has petered out and in the interests of moving it toward a resolution, I am recommending that we proceed with a solution to compile the JIT assembly files with the Netwide Assembler (NASM). Justification follows:

There is a consistent solution available on all x86 operating systems (Windows, Linux, and macOS). The same source file can be consumed on each platform without any intermediate translation steps. This is actually an improvement over our current approach for Linux as there often isn't a 1-1 mapping between the disassembled source code and the original source file. An identical assembler will also ensure the same level of x86 processor feature support simultaneously across all the platforms (there may be minor exceptions to this, but for the most part I believe that's true).
NASM is current (latest stable update is from Feb 2018) and open-source, and supports modern Intel processor features.
NASM syntax is similar in syntax to MASM which maintains familiarity and eases the transition to a new assembler.
The similarity to MASM syntax also permits a near-automatic translation of the current source to NASM syntax through modifications to masm2gas.pl. @nbhuiyan has already demonstrated it is possible to translate and link with the resulting object file.
It is the quickest solution to allowing the OpenJ9 JIT to build on macOS.

Other parts of the OpenJ9 VM use GNU M4 to allow assembly files to be processed and built on multiple operating systems. While inter-project consistency is always a goal to aspire to, I believe the number of lines of compiler assembly code to translate (at least 7000), the ease and likelihood of errors in the translation process from MASM, and the "look" of the resulting translated files for developers also bear consideration. At this point I don't think consolidating on one solution throughout OpenJ9 is necessary.

To move forward I think the plan of attack should roughly follow:

Decide on a modern version of NASM that will be required on Linux (32/64), Windows (32/64), and macOS build environments
Determine what NASM dependencies need to be installed on all CI systems (e.g., Travis), CI build farms (e.g., ci.eclipse.org), and at AdoptOpenJDK and update the builds to include those dependencies
IBM will need to update its internal build systems in a similar fashion
The NASM solutions should co-exist with the existing assembly solutions to assist with any triage debugging that might be required and to stage the delivery of this transformation.
The tool based on masm2gas.pl that @nbhuiyan adapted to perform the automatic file conversions will be used to convert each individual assembler file. Because the number of files isn't that large and to ease triaging of any potential failures in the future, I suggest that a separate commit be used for each transformed file.
To ensure bug-for-bug compatibility with the existing assembly files, investigate whether it is possible to validate the auto-conversion process in a rudimentary way by dumping the assembly instructions in a NASM-converted object file and the originally translated object file and comparing the two. The assembly should more or less match. Investigate any differences.

@nbhuiyan has agreed to make progress on this.

FYI: @andrewcraik @0dvictor @nbhuiyan @DanHeidinga @charliegracie @mstoodle @vijaysun-omr @pshipton @irinarada

andrewcraik commented 6 years ago

Thank you @nbhuiyan for working on this - it sounds like a great plan and I look forward to seeing the improvements it should bring to the x86 code generator.

charliegracie commented 6 years ago

I have given the lastcomment from Daryl both a thumbs up and a thumbs down. I love the idea of getting rid of current files but I would much rather switch to M4 to be consistent with the rest of the codebase.

DanHeidinga commented 6 years ago

I have to second @charliegracie's concerns with not pursuing a consistent project wide approach.

Other parts of the OpenJ9 VM use GNU M4 to allow assembly files to be processed and built on multiple operating systems.

Has any investigation been done in porting the existing JIT asm files to M4? Being able to compare the two approaches would make a stronger case for picking one over the other.

Earlier in this thread I asked how the NASM approach applied to other architectures.

Is this solution extendable to the other hw platforms: s390, ppc (including aix), ppcle, arm & arm64?

If NASM a preferred solution for the JIT files, can you also take a look at how it applies to the other supported architectures?

nbhuiyan commented 6 years ago

@DanHeidinga

Has any investigation been done in porting the existing JIT asm files to M4?

I personally have not spent much time investigating the possibility of porting the existing JIT x86 asm files into M4. Besides the significant differences in the way M4 works vs. MASM, there are certain limitations in M4 that @0dvictor already mentioned (i.e, lack of local macro/variable/label support and difficulty with debugging in M4) that may make it more difficult to perform and verify the outcome of the conversion when compared with NASM. I am interested in knowing why M4 was not used initially despite being around for a long time and instead opting for a masm+masm2gas solution for x86 in the first place.

If NASM a preferred solution for the JIT files, can you also take a look at how it applies to the other supported architectures?

Unfortunately, NASM is only applicable to the x86 architecture.

charliegracie commented 6 years ago

I am quite sad to see the x86 files being converted to NASM instead of moving to M4, especially since the conversation did not seem to be finished in this Issue. This means that the JIT will not be able to use the same tool to write hand written assembly on all of the supported platforms. M4 is available on ALL platforms used by OpenJ9.

While I do understand that it is easier to move from the current solution to NASM instead of M4 on x86 it does complicate future work and add more dependencies on the OpenJ9 project. M4 is already used by the VM as none of the other tools were available on all of the supported platforms. With different tools per platform there is less knowledge transfer and it significantly increases the complexity to make a cross platform change. IF M4 was used the JIT could possibly take advantage of VM macros and provide its own macros such that making cross platform changes could be significantly easier.

0xdaryl commented 6 years ago

As answered earlier, this is an x86-only solution.

If the only consideration was consistency with the rest of the project then this decision would be straightforward. However, in this case I think the experience of developers who have to create, maintain, and debug the thousands of lines of existing x86 JIT assembly code and exploit new hardware instructions bears at least equal (and likely more) consideration in my opinion. Their opinions expressed above suggest that an assembly style like NASM offers is what they prefer. My own recent experience poring through hundreds of lines of PicBuilder assembly code to debug problems with the macOS port has reinforced my opinion that the thing I'm debugging live has to closely match the assembly listing.

Translating assembly via an assembler is, frankly, choosing the right tool for the job. The fact that there is a modern, up-to-date assembler that can consume the same assembly syntax on the three platforms we care about, has a readable syntax for developers, and whose assembly can be easily and safely translated from our existing files is a bonus. I also expect that since the assembler is independent of any particular build compiler we have more freedom in advancing the assembler toolchain as new versions come available that support new processor features.

Speaking from experience as someone who has written assembly needing to exploit new processor features that aren't available in the current toolchain, it's a serious chore to have to write macros to support all the different encodings of new instructions. You're doing the job that your assembler should be doing. Before focusing on NASM, @nbhuiyan was trying to adapt the existing masm2gas.pl script to produce assembly syntax that could be consumed by the LLVM assembler for the macOS port. There were some assembly constructs for which he couldn't find any documentation and finally had to search the source code of the assembler to find the answers. These two are examples of a model that I don't want to carry forward if we can help it.

I'm also concerned about the learning curve and debugability with an M4 solution based on the recent experience of a couple of JIT developers who have attempted modifications in files needed by the JIT. The free-form macro syntax definitely took some getting used to and it wasn't always intuitive what was going on. While it is possible to become accustomed to this style, it seems unnatural especially coming from our current starting place. I would hate to think that anyone new looking to modify or debug JIT runtime code would have to start at first principles to learn the M4 environment rather than diving right in and being productive right away. I'm glad some of those that will ultimately be maintaining this code had this first-hand experience.

Frankly, my main concerns with NASM are our lack of familiarity with it (though the documentation available for NASM is complete and comprehensible) and the lack of significant examples of using this assembler in a production environment. Neither of these concerns are strong enough to displace my opinion of NASM.

eclipse-openj9 / openj9

Whither x86 JIT assembly files? #2418