asmjit / asmdb

Instructions database and utilities for X86/X64 and ARM (THUMB/A32/A64) architectures.
The Unlicense
328 stars 46 forks source link

Enough metadata for codegen? #2

Closed xoofx closed 7 years ago

xoofx commented 7 years ago

Hi there! Looking at the database x86data.js and I was wondering if the file has enough information to generate a proper x86/x64 code generator? (assuming that the /0, ib, /r...etc. have to be "handcoded")... as it looks like you are using it for asmjit (for the generate-XXX.js), I believe that it should be ok, but just want to be sure! Thanks!

kobalicek commented 7 years ago

Yes, it can.

However, if you check out the generate-x86.js tool it's pretty complicated to aggregate x86 data in a way to be used by encoder / decoder (the generate tool reads asmjit's DB and patches it on-the-fly). It most likely depends on how fast the encoder / decoder you want and how you want to deal with ambiguities (like which instruction to prefer - modrm/modmr, etc).

As AsmJit is a high performance encoder I still use custom opcode and encoding fields in its instruction database that I created manually in the past - these will be generated by generate-x86.js tool in the future, just haven't had time yet to fix the generator.

Just explore it and tell me what you miss. Asmdb was designed to be a universal database.

Also, you can check the data online here (it's not official yet, work in progress):

https://kobalicek.com/asmgrid

xoofx commented 7 years ago

Just explore it and tell me what you miss. Asmdb was designed to be a universal database.

That's amazing, exactly what I have been looking for. I like a lot the idea that the database is encoding things like side effects (RW on registers, flags...etc.) as it makes it well prepared for JIT/register allocator scenarios, instruction analysis/rescheduling...etc.

However, if you check out the generate-x86.js tool it's pretty complicated to aggregate x86 data in a way to be used by encoder / decoder (the generate tool reads asmjit's DB and patches it on-the-fly). It most likely depends on how fast the encoder / decoder you want and how you want to deal with ambiguities (like which instruction to prefer - modrm/modmr, etc).

No problem. I'm evaluating the idea to build a code gen in C#, so I will mostly work from the asmdb data.

Will get back here once I have started to use it to give some feedback/PR if necessary.

Thanks a lot for your work!

kobalicek commented 7 years ago

Let me know if you face any issues. You can optionally join asmjit/asmjit gitter chat if you need to get some quick info about something.