CensoredUsername / dynasm-rs

A dynasm-like tool for rust.
https://censoredusername.github.io/dynasm-rs/language/index.html
Mozilla Public License 2.0
716 stars 52 forks source link

Feature request: compile-time resolution of "super-local" label #69

Closed mkeeter closed 6 months ago

mkeeter commented 2 years ago

I noticed that hashmap lookups are taking a decent amount of JIT time when using local labels.

For example, if I manually compute jumps in this code:

dynasm!(ops
    // Basically the same as MinRegReg
    ; zip2 v4.s2, V(lhs_reg).s2, V(rhs_reg).s2
    ; zip1 v5.s2, V(rhs_reg).s2, V(lhs_reg).s2
    ; fcmgt v5.s2, v5.s2, v4.s2
    ; fmov x15, d5

    ; tst x15, #0x1_0000_0000
    ; b.ne >lhs

    ; tst x15, #0x1
    ; b.eq >both

    // LHS < RHS
    ; fmov D(out_reg), D(rhs_reg)
    ; mov w16, #CHOICE_RIGHT
    ; b >end

    // RHS < LHS
    ;lhs:
    ; fmov D(out_reg), D(lhs_reg)
    ; mov w16, #CHOICE_LEFT
    ; b >end

    ;both:
    ; fmax V(out_reg).s2, V(lhs_reg).s2, V(rhs_reg).s2
    ; mov w16, #CHOICE_BOTH

    ;end:
    ; strb w16, [x0], #1 // post-increment
)

I end up with something like this:

dynasm!(ops
    ; zip2 v4.s2, V(lhs_reg).s2, V(rhs_reg).s2
    ; zip1 v5.s2, V(rhs_reg).s2, V(lhs_reg).s2
    ; fcmgt v5.s2, v5.s2, v4.s2
    ; fmov x15, d5

    ; tst x15, #0x1_0000_0000
    ; b.ne #24 // -> lhs

    ; tst x15, #0x1
    ; b.eq #28 // -> both

    // LHS < RHS
    ; fmov D(out_reg), D(rhs_reg)
    ; mov w16, #CHOICE_RIGHT
    ; b #24 // -> end

    // <- lhs (when RHS < LHS)
    ; fmov D(out_reg), D(lhs_reg)
    ; mov w16, #CHOICE_LEFT
    ; b #12 // -> end

    // <- both
    ; fmax V(out_reg).s2, V(lhs_reg).s2, V(rhs_reg).s2
    ; mov w16, #CHOICE_BOTH

    // <- end
    ; strb w16, [x0], #1 // post-increment
)

In my codebase, this reduces the time spent in dynasm by about 30%, which is a decent chunk of performance!

It would be great to introduce a new flavor of label which is only valid during a single dynasm! block; the branch offset could then be computed at compile-time instead of runtime.

Techcable commented 2 years ago

Yes. Keep in mind that the original DynASM project for C (used as backend in LuaJIT 1.x, used as frontend in ) does not use a hash map for labels. It is a plain array.

I know this is hard given the constraints of rust proc-macros, but Ideally we would move in that direction......

CensoredUsername commented 2 years ago

It wouldn't necessarily have to be a new type of label, it'd be possible to guarantee that local labels in a single block always get determined at compile time. It'll be quite annoying to implement though.

But before we have a try at that, The default LabelRegistry just uses the standard cryptographically secure hasher in the label hashMaps. You could try benchmarking it with FnvHasher instead.

CensoredUsername commented 1 year ago

Idea: it wouldn't be hard to just special case strings of length 1 to just be array lookups instead. Make those bypass the internal hashmaps, if recent changes do not alleviate the bottleneck enough.

CensoredUsername commented 1 year ago

As discussed in the related pull request, you can use dynamic labels to skip the overhead from hashmap lookups if you want to know the theoretical speedup due to that. That would be useful knowledge to have before proceeding on working on this.

CensoredUsername commented 6 months ago

Closing this due to inactivity after requested information.