cnlohr / rv003usb

CH32V003 RISC-V Pure Software USB Controller
MIT License
440 stars 44 forks source link

Better instruction sequence for CRC calculations #7

Open Domkeykong opened 1 year ago

Domkeykong commented 1 year ago

General CRC computation

I have figured out a really good way to do CRC in 6 fully compressable instructions!

// Register Mappings:
//        a0 = current bit value
//        a3 = running CRC
//        a4 = CRCPOLY
// Clobbers: a0
        c.xor  a0, a3
        c.slli a0, 31 // LSB -> MSB
        c.srai a0, 31 // Copy MSB into all other bits
        c.and  a0, a4
        c.srli a3, 1
        c.xor  a3, a0

The current way assumes that the bit value is bitwise negated, which means it doesnt work for sending data https://github.com/cnlohr/rv003usb/blob/4cfe8201e15f54e5ece6e178153990f82d764951/firmware/rv003usb.S#L207-L213 https://github.com/cnlohr/rv003usb/blob/4cfe8201e15f54e5ece6e178153990f82d764951/firmware/rv003usb.S#L460-L466

Bit specific CRC computation

I also created Instruction sequences for when we already know which bit value we are currently handling. This takes only 5 Instructions but has the penalty of using one large instruction in the beginning.

// Register Mappings:
//        a0 = Temp
//        a3 = running CRC
//        a4 = CRCPOLY
// Clobbers: a0
do1_crc:
    andi   a0, a3, 1 
    c.srli a3, 1
    c.addi a0, -1
    c.andi a0, a4
    c.xor  a3, a0

This one removes the need of the neg instruction being able to compress 1 more instruction

do0_crc:
    slli a0,a3,31 // Put a3s LSB into a0s MSB
    c.srai a0,31    // Copy MSB into all other bits
    c.srli a3,1
    c.andi a0,a4
    c.xor  a3,a0

https://github.com/cnlohr/rv003usb/blob/4cfe8201e15f54e5ece6e178153990f82d764951/firmware/rv003usb.S#L249-L254

The main trick i used hre is to shift left and then shift right arithmetic to copy the LSB to all into all the other places

duk-37 commented 1 year ago

the sign-preserving right shift trick for the zero bit case should help here, thanks! the generic case is also useful but probably less so given the send logic already specializes for zero- and one- bit cases. also that crc code is misplaced/breaking stuff at the moment anyways

crc1 here is equivalent to what we already have but with the shift reordered

cnlohr commented 1 year ago

@duk-37 any chance you would be interested in reworking some of the assembly once I get a fully working stack? I don't think I want to stake stream time to further optimize things, but it would be fun to do before a supercut.

duk-37 commented 1 year ago

@duk-37 any chance you would be interested in reworking some of the assembly once I get a fully working stack? I don't think I want to stake stream time to further optimize things, but it would be fun to do before a supercut.

Sure, I can take a look! Will also be a lot easier once we know more about the chip internals (#5 and Macyler's work)