Samsung / walrus

WebAssembly Lightweight RUntime
Apache License 2.0
40 stars 10 forks source link

Add size to variable sized byte codes #86

Closed zherczeg closed 1 year ago

zherczeg commented 1 year ago

Are the variable sized byte codes pointer aligned? It seems to me a list of uint16_t values are added after them, but there is no aligning, so the next byte code might be unaligned. Unaligned accesses are generally slower.

ksh8281 commented 1 year ago

IHMO We should not save offset size on ByteCode because we can compute it.

I think saving offset size can cause these effects

pros . exceution speed can be better. (offset computing time is not required)

cons . bigger ByteCode size . exceution speed can be slower. (cpu cache miss will increase due to bigger code)

I think we can run make benchmark for this. (consider add size on bytecode or not)

How do you think?

zherczeg commented 1 year ago

I am neutral. We already have a variant of the code which computes the sizes, but it does not look nice so this idea was suggested.

Is there a benchmark you prefer? Or shall we just use a C program compiled by emscripten? We also want to measure the effect of executing unaligned byte code (currently a single offset makes the following byte code unaligned).

I am also thinking about a third option: move offsets to a separate stream. This way the byte code would be aligned, and storing the number of offsets is unnecessary. The cost is an extra offset (or pointer) in the other stream.

ksh8281 commented 1 year ago

I don't have any idea of what benchmark suitable for this situation. both of any benchmark or simple program you make are fine for me. and IMO aligned bytecode is good idea.

kulcsaradam commented 1 year ago

I ran some tests with the written quicksort function:

test/perf/quicksort_recursive where max is 100000:
x86-64 old runtime: 0m25,965s new runtime: 0m25,940s
x86-32 old runtime: 0m20,629s new runtime: 0m20,664s
zherczeg commented 1 year ago

@ksh8281 what do you think about the results? I don't see big difference in runtime.

ksh8281 commented 1 year ago

IMHO There is no performance effect, I hardly to think it needs store on bytecode.(it needs more memory) How do you think about this? I adding a util function for compute byte size is good

zherczeg commented 1 year ago

The #82 is doing this, it works without adding anything to the byte code. Is that code is ok for you?

ksh8281 commented 1 year ago

adding ByteCode::getSize function is good for me! inlining getSize function on interpreter should be better

zherczeg commented 1 year ago

I am not sure I follow. Do you want to change the interpreter in some way?

The only disadvantage of the #82 is that it has two size computations, one for the parser, and one for the rest of the code. Duplicated code is hard to maintain, but if we put comments, probably it is manageable.

ksh8281 commented 1 year ago

there is 3 duplicated code(intepreter, parser, bytecode) what compute certain code size. I want to merge 3-of them and inlining the function insted of calling in intepreter(for performance).

kulcsaradam commented 1 year ago

If it would be prefered the function called ByteCode::byteCodeSize could remain in the parser in #82 , since size computations are only used for dumping bytecode in there as far as I know. Then there would be no need to maintain more code or to have a function with an unmanageable number of parameters and it could still be inlined.

Only con is that the function was meant to replace ByteCode::byteCodeSize. What is your take on this?

ksh8281 commented 1 year ago

If there is no performance effect, I hardly to agree adding bytecode size on bytecode just replace "ByteCode::byteCodeSize" is ok. it can remove duplicated code

zherczeg commented 1 year ago

These are the three byte code size getters now:

How would you merge these three?

ksh8281 commented 1 year ago

I understand what you say. It's ok.

zherczeg commented 1 year ago

Shall we do changes with #86? Would be great to merge ByteCode::getSize and WASMParser::getSize, but I don't see any simple way to do it. Maybe if byte code dumping would be done after the byte code parsing is completed, the latter would be unnecessary.