Samsung / walrus

WebAssembly Lightweight RUntime
Apache License 2.0
39 stars 10 forks source link

Rework SIMD operand type description #234

Closed zherczeg closed 6 months ago

zherczeg commented 6 months ago

This patch depends on #233 which is a simple patch. This patch is another 1000 lines big patch, which focuses on simd support. The difficult part is that we have 3 cpu targets for simd, and all of them needed to be reworked.

clover2123 commented 6 months ago

Would you please rebase this patch again?

zherczeg commented 6 months ago

@clover2123 yes. I have rebased the patch. All tests pass.

zherczeg commented 6 months ago

@clover2123 May I have a question? For analysis purposes, I would like to split Store32 / Store64 into StoreI32, StoreF32, StoreI64, StoreF64. This way I could detect all operations, that they use integer, or floating point data, because Move/Load/Const could be auto-detected by data flow analysis. Would this be bad for the interpreter?

clover2123 commented 6 months ago

@zherczeg IMHO splitting of instructions would increase the number of interpreter routines and this may affect the performance, but I think that this would be insignificant. I can't think of anything else besides this.

clover2123 commented 6 months ago

@zherczeg BTW should detection of data types be done in instruction level? Would you elaborate a bit more about the planned analysis?

zherczeg commented 6 months ago

The long term plan is creating a register allocation algorithm. It looks like it will be complex (costly) operation, but something we need to do. This looks like a good text about it:

https://web.stanford.edu/class/archive/cs/cs143/cs143.1128/lectures/17/Slides17.pdf

A variant of Linear Scan / Second-Chance Bin Packing should be enough. Regardless of the algorithm, full data flow analysis is required. We already have one, but it omits information to reduce memory consumption. Perhaps we need to gather all information and store it during compilation.

To allocate registers, we need to know the type of register. Gpr, float, simd are the three types. We also need to consider the register layout. E.g. ARM32 combines two single precision registers to one double precision, and two double precision registers to one simd. These things add a lot of complication.

We already have I32Load / F32Load, and these are the same operations in the interpreter. So I was thinking whether we can split other operations to int/float variants. This is not the only option. We could create a bitset for this information, which stores an extra bit for certain types of instructions. The bitset follows the order of byte code instructions. The bitset is ignored by the interpreter, but the JIT compiler can use it.

I am open to ideas.

clover2123 commented 6 months ago

E.g. ARM32 combines two single precision registers to one double precision, and two double precision registers to one simd. These things add a lot of complication.

Is ARM32 the most complex target among other supported architectures (aarch64, x86, x64) when we consider register allocation?

We could create a bitset for this information, which stores an extra bit for certain types of instructions. The bitset follows the order of byte code instructions. The bitset is ignored by the interpreter, but the JIT compiler can use it.

It seems that this type information (bitset) should be allocated and created during the parsing process and be kept in the memory for JIT compilation. Then, after JITC applied, this bitset could be removed for memory saving?

zherczeg commented 6 months ago

Is ARM32 the most complex target among other supported architectures (aarch64, x86, x64) when we consider register allocation?

32 bit x86 has only 8 registers (one is stack pointer), so from integer register allocation standpoint, probably the most difficult target. From floating point perspective, ARM32 is the most difficult.

It seems that this type information (bitset) should be allocated and created during the parsing process and be kept in the memory for JIT compilation. Then, after JITC applied, this bitset could be removed for memory saving?

Yes. Even the byte code could be removed if not used by the interpreter anymore.